Skip to content

Conversation

maksimsab
Copy link
Contributor

@maksimsab maksimsab commented Jul 8, 2025

This patch adds an Offload Wrapper for the SYCL kind. This is an essential step for SYCL offloading and the compilation flow. The usage of offload wrapping is added to the clang-linker-wrapper tool.

Modifications:
Implemented bundleSYCL() function to handle SYCL image bundling.
Implemented wrapSYCLBinaries() function that is invoked from clang-linker-wrapper.
SYCL Offload Wrapping uses specific data structures such as __sycl.tgt_device_image and __sycl.tgt_bin_desc. Each SYCL image maintains its own symbol table (unlike shared global tables in other targets). Therefore, symbols are encoded explicitly during the offload wrapping. Also, images refer to their own Offloading Entries arrays unlike other targets.
The proposed __sycl.tgt_device_image uses Version 3 to differentiate from images generated by Intel DPC++. The structure proposed in this patch doesn't have fields deprecated in DPC++.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' llvm:binary-utilities labels Jul 8, 2025
@llvmbot
Copy link
Member

llvmbot commented Jul 8, 2025

@llvm/pr-subscribers-llvm-binary-utilities
@llvm/pr-subscribers-clang-driver

@llvm/pr-subscribers-clang

Author: Maksim Sabianin (maksimsab)

Changes

Patch is 34.84 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/147508.diff

11 Files Affected:

  • (modified) clang/test/Driver/linker-wrapper-image.c (+35)
  • (modified) clang/test/Driver/linker-wrapper.c (+1-1)
  • (modified) clang/tools/clang-linker-wrapper/CMakeLists.txt (+1)
  • (modified) clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp (+40-1)
  • (added) llvm/include/llvm/Frontend/SYCL/OffloadWrapper.h (+44)
  • (modified) llvm/include/llvm/Object/OffloadBinary.h (+6-3)
  • (modified) llvm/lib/Frontend/CMakeLists.txt (+1)
  • (added) llvm/lib/Frontend/SYCL/CMakeLists.txt (+14)
  • (added) llvm/lib/Frontend/SYCL/OffloadWrapper.cpp (+513)
  • (modified) llvm/lib/Object/OffloadBinary.cpp (+11)
  • (modified) llvm/unittests/Object/OffloadingTest.cpp (+9)
diff --git a/clang/test/Driver/linker-wrapper-image.c b/clang/test/Driver/linker-wrapper-image.c
index c0de56d58196a..67bb21bfe49b4 100644
--- a/clang/test/Driver/linker-wrapper-image.c
+++ b/clang/test/Driver/linker-wrapper-image.c
@@ -1,6 +1,7 @@
 // REQUIRES: x86-registered-target
 // REQUIRES: nvptx-registered-target
 // REQUIRES: amdgpu-registered-target
+// REQUIRES: spirv-registered-target
 
 // RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t.elf.o
 
@@ -263,3 +264,37 @@
 //      HIP: while.end:
 // HIP-NEXT:   ret void
 // HIP-NEXT: }
+
+// RUN: clang-offload-packager -o %t.out --image=file=%t.elf.o,kind=sycl,triple=spirv64-unknown-unknown,arch=generic
+// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t.o \
+// RUN:   -fembed-offload-object=%t.out
+// RUN: clang-linker-wrapper --print-wrapped-module --dry-run --host-triple=x86_64-unknown-linux-gnu \
+// RUN:   --linker-path=/usr/bin/ld %t.o -o a.out 2>&1 | FileCheck %s --check-prefixes=SYCL
+// RUN: clang-linker-wrapper --print-wrapped-module --dry-run --host-triple=x86_64-unknown-linux-gnu -r \
+// RUN:   --linker-path=/usr/bin/ld %t.o -o a.out 2>&1 | FileCheck %s --check-prefixes=SYCL
+
+//      SYCL: %__sycl.tgt_device_image = type { i16, i8, i8, ptr, ptr, ptr, ptr, ptr, ptr, ptr, ptr, ptr }
+// SYCL-NEXT: %__sycl.tgt_bin_desc = type { i16, i16, ptr, ptr, ptr }
+
+//      SYCL: @.sycl_offloading.target.0 = internal unnamed_addr constant [1 x i8] zeroinitializer
+// SYCL-NEXT: @.sycl_offloading.opts.compile.0 = internal unnamed_addr constant [1 x i8] zeroinitializer
+// SYCL-NEXT: @.sycl_offloading.opts.link.0 = internal unnamed_addr constant [1 x i8] zeroinitializer
+// SYCL-NEXT: @.sycl_offloading.0.data = internal unnamed_addr constant [0 x i8] zeroinitializer
+// SYCL-NEXT: @.sycl_offloading.0.info = internal local_unnamed_addr constant [2 x i64] [i64 ptrtoint (ptr @.sycl_offloading.0.data to i64), i64 0], section ".tgtimg", align 16
+// SYCL-NEXT: @llvm.used = appending global [1 x ptr] [ptr @.sycl_offloading.0.info], section "llvm.metadata"
+// SYCL-NEXT: @.sycl_offloading.device_images = internal unnamed_addr constant [1 x %__sycl.tgt_device_image] [%__sycl.tgt_device_image { i16 3, i8 8, i8 0, ptr @.sycl_offloading.target.0, ptr @.sycl_offloading.opts.compile.0, ptr @.sycl_offloading.opts.link.0, ptr @.sycl_offloading.0.data, ptr @.sycl_offloading.0.data, ptr null, ptr null, ptr null, ptr null }]
+// SYCL-NEXT: @.sycl_offloading.descriptor = internal constant %__sycl.tgt_bin_desc { i16 1, i16 1, ptr @.sycl_offloading.device_images, ptr null, ptr null }
+// SYCL-NEXT: @llvm.global_ctors = appending global [1 x { i32, ptr, ptr }] [{ i32, ptr, ptr } { i32 1, ptr @sycl.descriptor_reg, ptr null }]
+// SYCL-NEXT: @llvm.global_dtors = appending global [1 x { i32, ptr, ptr }] [{ i32, ptr, ptr } { i32 1, ptr @sycl.descriptor_unreg, ptr null }]
+
+//      SYCL: define internal void @sycl.descriptor_reg() section ".text.startup" {
+// SYCL-NEXT: entry:
+// SYCL-NEXT:   call void @__sycl_register_lib(ptr @.sycl_offloading.descriptor)
+// SYCL-NEXT:   ret void
+// SYCL-NEXT: }
+
+//      SYCL: define internal void @sycl.descriptor_unreg() section ".text.startup" {
+// SYCL-NEXT: entry:
+// SYCL-NEXT:   call void @__sycl_unregister_lib(ptr @.sycl_offloading.descriptor)
+// SYCL-NEXT:   ret void
+// SYCL-NEXT: }
diff --git a/clang/test/Driver/linker-wrapper.c b/clang/test/Driver/linker-wrapper.c
index 80b1a5745a123..5ab8a09660e57 100644
--- a/clang/test/Driver/linker-wrapper.c
+++ b/clang/test/Driver/linker-wrapper.c
@@ -54,7 +54,7 @@ __attribute__((visibility("protected"), used)) int x;
 // RUN: clang-offload-packager -o %t.out \
 // RUN:   --image=file=%t.spirv.bc,kind=sycl,triple=spirv64-unknown-unknown,arch=generic
 // RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t.o -fembed-offload-object=%t.out
-// RUN: not clang-linker-wrapper --host-triple=x86_64-unknown-linux-gnu --dry-run \
+// RUN: clang-linker-wrapper --host-triple=x86_64-unknown-linux-gnu --dry-run \
 // RUN:   --linker-path=/usr/bin/ld %t.o -o a.out 2>&1 | FileCheck %s --check-prefix=SPIRV-LINK
 
 // SPIRV-LINK: clang{{.*}} -o {{.*}}.img --target=spirv64-unknown-unknown {{.*}}.o --sycl-link -Xlinker -triple=spirv64-unknown-unknown -Xlinker -arch=
diff --git a/clang/tools/clang-linker-wrapper/CMakeLists.txt b/clang/tools/clang-linker-wrapper/CMakeLists.txt
index bf37d8031025e..741e3fbbefb74 100644
--- a/clang/tools/clang-linker-wrapper/CMakeLists.txt
+++ b/clang/tools/clang-linker-wrapper/CMakeLists.txt
@@ -16,6 +16,7 @@ set(LLVM_LINK_COMPONENTS
   CodeGen
   LTO
   FrontendOffloading
+  FrontendSYCL
   )
 
 set(LLVM_TARGET_DEFINITIONS LinkerWrapperOpts.td)
diff --git a/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp b/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
index 0f1fa8b329fd6..9a466d6e69c31 100644
--- a/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
+++ b/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
@@ -22,6 +22,7 @@
 #include "llvm/CodeGen/CommandFlags.h"
 #include "llvm/Frontend/Offloading/OffloadWrapper.h"
 #include "llvm/Frontend/Offloading/Utility.h"
+#include "llvm/Frontend/SYCL/OffloadWrapper.h"
 #include "llvm/IR/Constants.h"
 #include "llvm/IR/DiagnosticPrinter.h"
 #include "llvm/IR/Module.h"
@@ -711,6 +712,13 @@ wrapDeviceImages(ArrayRef<std::unique_ptr<MemoryBuffer>> Buffers,
             M, BuffersToWrap.front(), offloading::getOffloadEntryArray(M)))
       return std::move(Err);
     break;
+  case OFK_SYCL: {
+    offloading::sycl::SYCLWrappingOptions WrappingOptions;
+    if (Error Err = offloading::sycl::wrapSYCLBinaries(M, BuffersToWrap,
+                                                       WrappingOptions))
+      return Err;
+    break;
+  }
   default:
     return createStringError(getOffloadKindName(Kind) +
                              " wrapping is not supported");
@@ -748,6 +756,36 @@ bundleOpenMP(ArrayRef<OffloadingImage> Images) {
   return std::move(Buffers);
 }
 
+Expected<SmallVector<std::unique_ptr<MemoryBuffer>>>
+bundleSYCL(ArrayRef<OffloadingImage> Images) {
+  SmallVector<std::unique_ptr<MemoryBuffer>> Buffers;
+  if (DryRun) {
+    // In dry-run mode there is an empty input which is insufficient for
+    // the testing. Therefore, we insert a stub value.
+    OffloadBinary::OffloadingImage Image;
+    Image.TheOffloadKind = OffloadKind::OFK_SYCL;
+    Image.Image = MemoryBuffer::getMemBufferCopy("");
+    SmallString<0> SerializedImage = OffloadBinary::write(Image);
+    Buffers.emplace_back(MemoryBuffer::getMemBufferCopy(SerializedImage));
+    return Buffers;
+  }
+
+  for (const OffloadingImage &TheImage : Images) {
+    SmallVector<OffloadFile> OffloadBinaries;
+    if (Error E = extractOffloadBinaries(*TheImage.Image, OffloadBinaries))
+      return E;
+
+    for (const OffloadFile &File : OffloadBinaries) {
+      const OffloadBinary &Binary = *File.getBinary();
+      SmallString<0> SerializedImage =
+          OffloadBinary::write(Binary.getOffloadingImage());
+      Buffers.emplace_back(MemoryBuffer::getMemBufferCopy(SerializedImage));
+    }
+  }
+
+  return Buffers;
+}
+
 Expected<SmallVector<std::unique_ptr<MemoryBuffer>>>
 bundleCuda(ArrayRef<OffloadingImage> Images, const ArgList &Args) {
   SmallVector<std::pair<StringRef, StringRef>, 4> InputFiles;
@@ -800,8 +838,9 @@ bundleLinkedOutput(ArrayRef<OffloadingImage> Images, const ArgList &Args,
   llvm::TimeTraceScope TimeScope("Bundle linked output");
   switch (Kind) {
   case OFK_OpenMP:
-  case OFK_SYCL:
     return bundleOpenMP(Images);
+  case OFK_SYCL:
+    return bundleSYCL(Images);
   case OFK_Cuda:
     return bundleCuda(Images, Args);
   case OFK_HIP:
diff --git a/llvm/include/llvm/Frontend/SYCL/OffloadWrapper.h b/llvm/include/llvm/Frontend/SYCL/OffloadWrapper.h
new file mode 100644
index 0000000000000..f89411c86984d
--- /dev/null
+++ b/llvm/include/llvm/Frontend/SYCL/OffloadWrapper.h
@@ -0,0 +1,44 @@
+//===----- OffloadWrapper.h -------------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_FRONTEND_SYCL_OFFLOAD_WRAPPER_H
+#define LLVM_FRONTEND_SYCL_OFFLOAD_WRAPPER_H
+
+#include "llvm/ADT/ArrayRef.h"
+#include "llvm/Object/OffloadBinary.h"
+
+#include <string>
+
+namespace llvm {
+
+class Module;
+
+namespace offloading {
+namespace sycl {
+
+struct SYCLWrappingOptions {
+  // target/compiler specific options what are suggested to use to "compile"
+  // program at runtime.
+  std::string CompileOptions;
+  // Target/Compiler specific options that are suggested to use to "link"
+  // program at runtime.
+  std::string LinkOptions;
+};
+
+/// Wraps OffloadBinaries in the given \p Buffers into the module \p M
+/// as global symbols and registers the images with the SYCL Runtime.
+/// \param Options Settings that allows to turn on optional data and settings.
+llvm::Error
+wrapSYCLBinaries(llvm::Module &M, llvm::ArrayRef<llvm::ArrayRef<char>> Buffers,
+                 SYCLWrappingOptions Options = SYCLWrappingOptions());
+
+} // namespace sycl
+} // namespace offloading
+} // namespace llvm
+
+#endif // LLVM_FRONTEND_SYCL_OFFLOAD_WRAPPER_H
diff --git a/llvm/include/llvm/Object/OffloadBinary.h b/llvm/include/llvm/Object/OffloadBinary.h
index b5c845fa8eb70..9d137db834f08 100644
--- a/llvm/include/llvm/Object/OffloadBinary.h
+++ b/llvm/include/llvm/Object/OffloadBinary.h
@@ -48,6 +48,7 @@ enum ImageKind : uint16_t {
   IMG_Cubin,
   IMG_Fatbinary,
   IMG_PTX,
+  IMG_SPIRV,
   IMG_LAST,
 };
 
@@ -70,9 +71,9 @@ class OffloadBinary : public Binary {
 
   /// The offloading metadata that will be serialized to a memory buffer.
   struct OffloadingImage {
-    ImageKind TheImageKind;
-    OffloadKind TheOffloadKind;
-    uint32_t Flags;
+    ImageKind TheImageKind = ImageKind::IMG_None;
+    OffloadKind TheOffloadKind = OffloadKind::OFK_None;
+    uint32_t Flags = 0;
     MapVector<StringRef, StringRef> StringData;
     std::unique_ptr<MemoryBuffer> Image;
   };
@@ -84,6 +85,8 @@ class OffloadBinary : public Binary {
   /// Serialize the contents of \p File to a binary buffer to be read later.
   LLVM_ABI static SmallString<0> write(const OffloadingImage &);
 
+  OffloadingImage getOffloadingImage() const;
+
   static uint64_t getAlignment() { return 8; }
 
   ImageKind getImageKind() const { return TheEntry->TheImageKind; }
diff --git a/llvm/lib/Frontend/CMakeLists.txt b/llvm/lib/Frontend/CMakeLists.txt
index 3b31e6f8dec96..6c4b8362c04fd 100644
--- a/llvm/lib/Frontend/CMakeLists.txt
+++ b/llvm/lib/Frontend/CMakeLists.txt
@@ -5,3 +5,4 @@ add_subdirectory(HLSL)
 add_subdirectory(OpenACC)
 add_subdirectory(OpenMP)
 add_subdirectory(Offloading)
+add_subdirectory(SYCL)
diff --git a/llvm/lib/Frontend/SYCL/CMakeLists.txt b/llvm/lib/Frontend/SYCL/CMakeLists.txt
new file mode 100644
index 0000000000000..355ae5f7955a8
--- /dev/null
+++ b/llvm/lib/Frontend/SYCL/CMakeLists.txt
@@ -0,0 +1,14 @@
+add_llvm_component_library(LLVMFrontendSYCL
+  OffloadWrapper.cpp
+
+  ADDITIONAL_HEADER_DIRS
+  ${LLVM_MAIN_INCLUDE_DIR}/llvm/Frontend
+  ${LLVM_MAIN_INCLUDE_DIR}/llvm/Frontend/SYCL
+
+  LINK_COMPONENTS
+  Core
+  FrontendOffloading
+  Object
+  Support
+  TransformUtils
+  )
diff --git a/llvm/lib/Frontend/SYCL/OffloadWrapper.cpp b/llvm/lib/Frontend/SYCL/OffloadWrapper.cpp
new file mode 100644
index 0000000000000..c0d160c39e93d
--- /dev/null
+++ b/llvm/lib/Frontend/SYCL/OffloadWrapper.cpp
@@ -0,0 +1,513 @@
+//===- SYCLOffloadWrapper.cpp -----------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "llvm/Frontend/SYCL/OffloadWrapper.h"
+#include "llvm/ADT/ArrayRef.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/ADT/Twine.h"
+#include "llvm/Frontend/Offloading/Utility.h"
+#include "llvm/IR/Constants.h"
+#include "llvm/IR/DerivedTypes.h"
+#include "llvm/IR/GlobalVariable.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/LLVMContext.h"
+#include "llvm/IR/Module.h"
+#include "llvm/IR/Type.h"
+#include "llvm/Object/OffloadBinary.h"
+#include "llvm/Support/Error.h"
+#include "llvm/Support/ErrorHandling.h"
+#include "llvm/Support/LineIterator.h"
+#include "llvm/Support/MemoryBufferRef.h"
+#include "llvm/Transforms/Utils/ModuleUtils.h"
+
+#include <memory>
+#include <string>
+#include <utility>
+
+using namespace llvm;
+using namespace llvm::object;
+using namespace llvm::offloading;
+using namespace llvm::offloading::sycl;
+
+using OffloadingImage = OffloadBinary::OffloadingImage;
+
+namespace {
+
+/// Wrapper helper class that creates all LLVM IRs wrapping given images.
+struct Wrapper {
+  Module &M;
+  LLVMContext &C;
+  SYCLWrappingOptions Options;
+
+  StructType *EntryTy = nullptr;
+  StructType *SyclDeviceImageTy = nullptr;
+  StructType *SyclBinDescTy = nullptr;
+
+  Wrapper(Module &M, const SYCLWrappingOptions &Options)
+      : M(M), C(M.getContext()), Options(Options) {
+
+    EntryTy = offloading::getEntryTy(M);
+    SyclDeviceImageTy = getSyclDeviceImageTy();
+    SyclBinDescTy = getSyclBinDescTy();
+  }
+
+  IntegerType *getSizeTTy() {
+    switch (M.getDataLayout().getPointerSize()) {
+    case 4:
+      return Type::getInt32Ty(C);
+    case 8:
+      return Type::getInt64Ty(C);
+    }
+    llvm_unreachable("unsupported pointer type size");
+  }
+
+  SmallVector<Constant *, 2> getSizetConstPair(size_t First, size_t Second) {
+    IntegerType *SizeTTy = getSizeTTy();
+    return SmallVector<Constant *, 2>{ConstantInt::get(SizeTTy, First),
+                                      ConstantInt::get(SizeTTy, Second)};
+  }
+
+  /// Note: Properties aren't supported and the support is going
+  /// to be added later.
+  /// Creates a structure corresponding to:
+  /// SYCL specific image descriptor type.
+  /// \code
+  /// struct __sycl.tgt_device_image {
+  ///   // version of this structure - for backward compatibility;
+  ///   // all modifications which change order/type/offsets of existing fields
+  ///   // should increment the version.
+  ///   uint16_t Version;
+  ///   // the kind of offload model the image employs.
+  ///   uint8_t OffloadKind;
+  ///   // format of the image data - SPIRV, LLVMIR bitcode, etc
+  ///   uint8_t Format;
+  ///   // null-terminated string representation of the device's target
+  ///   // architecture
+  ///   const char *Arch;
+  ///   // a null-terminated string; target- and compiler-specific options
+  ///   // which are suggested to use to "compile" program at runtime
+  ///   const char *CompileOptions;
+  ///   // a null-terminated string; target- and compiler-specific options
+  ///   // which are suggested to use to "link" program at runtime
+  ///   const char *LinkOptions;
+  ///   // Pointer to the device binary image start
+  ///   void *ImageStart;
+  ///   // Pointer to the device binary image end
+  ///   void *ImageEnd;
+  ///   // the entry table
+  ///   __tgt_offload_entry *EntriesBegin;
+  ///   __tgt_offload_entry *EntriesEnd;
+  ///   const char *PropertiesBegin;
+  ///   const char *PropertiesEnd;
+  /// };
+  /// \endcode
+  StructType *getSyclDeviceImageTy() {
+    return StructType::create(
+        {
+            Type::getInt16Ty(C),       // Version
+            Type::getInt8Ty(C),        // OffloadKind
+            Type::getInt8Ty(C),        // Format
+            PointerType::getUnqual(C), // Arch
+            PointerType::getUnqual(C), // CompileOptions
+            PointerType::getUnqual(C), // LinkOptions
+            PointerType::getUnqual(C), // ImageStart
+            PointerType::getUnqual(C), // ImageEnd
+            PointerType::getUnqual(C), // EntriesBegin
+            PointerType::getUnqual(C), // EntriesEnd
+            PointerType::getUnqual(C), // PropertiesBegin
+            PointerType::getUnqual(C)  // PropertiesEnd
+        },
+        "__sycl.tgt_device_image");
+  }
+
+  /// Creates a structure for SYCL specific binary descriptor type. Corresponds
+  /// to:
+  ///
+  /// \code
+  ///  struct __sycl.tgt_bin_desc {
+  ///    // version of this structure - for backward compatibility;
+  ///    // all modifications which change order/type/offsets of existing fields
+  ///    // should increment the version.
+  ///    uint16_t Version;
+  ///    uint16_t NumDeviceImages;
+  ///    __sycl.tgt_device_image *DeviceImages;
+  ///    // the offload entry table
+  ///    __tgt_offload_entry *HostEntriesBegin;
+  ///    __tgt_offload_entry *HostEntriesEnd;
+  ///  };
+  /// \endcode
+  StructType *getSyclBinDescTy() {
+    return StructType::create(
+        {Type::getInt16Ty(C), Type::getInt16Ty(C), PointerType::getUnqual(C),
+         PointerType::getUnqual(C), PointerType::getUnqual(C)},
+        "__sycl.tgt_bin_desc");
+  }
+
+  /// Adds a global readonly variable that is initialized by given
+  /// \p Initializer to the module.
+  GlobalVariable *addGlobalArrayVariable(const Twine &Name,
+                                         ArrayRef<char> Initializer,
+                                         const Twine &Section = "") {
+    auto *Arr = ConstantDataArray::get(M.getContext(), Initializer);
+    auto *Var = new GlobalVariable(M, Arr->getType(), /*isConstant*/ true,
+                                   GlobalVariable::InternalLinkage, Arr, Name);
+    Var->setUnnamedAddr(GlobalValue::UnnamedAddr::Global);
+
+    SmallVector<char, 32> NameBuf;
+    auto SectionName = Section.toStringRef(NameBuf);
+    if (!SectionName.empty())
+      Var->setSection(SectionName);
+    return Var;
+  }
+
+  /// Adds given \p Buf as a global variable into the module.
+  /// \returns Pair of pointers that point at the beginning and the end of the
+  /// variable.
+  std::pair<Constant *, Constant *>
+  addArrayToModule(ArrayRef<char> Buf, const Twine &Name,
+                   const Twine &Section = "") {
+    auto *Var = addGlobalArrayVariable(Name, Buf, Section);
+    auto *ImageB = ConstantExpr::getGetElementPtr(Var->getValueType(), Var,
+                                                  getSizetConstPair(0, 0));
+    auto *ImageE = ConstantExpr::getGetElementPtr(
+        Var->getValueType(), Var, getSizetConstPair(0, Buf.size()));
+    return std::make_pair(ImageB, ImageE);
+  }
+
+  /// Adds given \p Data as constant byte array in the module.
+  /// \returns Constant pointer to the added data. The pointer type does not
+  /// carry size information.
+  Constant *addRawDataToModule(ArrayRef<char> Data, const Twine &Name) {
+    auto *Var = addGlobalArrayVariable(Name, Data);
+    auto *DataPtr = ConstantExpr::getGetElementPtr(Var->getValueType(), Var,
+                                                   getSizetConstPair(0, 0));
+    return DataPtr;
+  }
+
+  /// Creates a global variable of const char* type and creates an
+  /// initializer that initializes it with \p Str.
+  ///
+  /// \returns Link-time constant pointer (constant expr) to that
+  /// variable.
+  Constant *addStringToModule(StringRef Str, const Twine &Name) {
+    auto *Arr = ConstantDataArray::getString(C, Str);
+    auto *Var = new GlobalVariable(M, Arr->getType(), /*isConstant*/ true,
+                                   GlobalVariable::InternalLinkage, Arr, Name);
+    Var->setUnnamedAddr(GlobalValue::UnnamedAddr::Global);
+    auto *Zero = ConstantInt::get(getSizeTTy(), 0);
+    Constant *ZeroZero[] = {Zero, Zero};
+    return ConstantExpr::getGetElementPtr(Var->getValueType(), Var, ZeroZero);
+  }
+
+  /// Creates a global variable of array of structs and initializes
+  /// it with the given values in \p ArrayData.
+  ///
+  /// \returns Pair of Constants that point at array content.
+  /// If \p ArrayData is empty then a returned pair contains nullptrs.
+  std::pair<Constant *, Constant *>
+  addStructArrayToModule(ArrayRef<Constant *> ArrayData, Type *ElemTy) {
+    if (ArrayData.empty()) {
+      auto *PtrTy = llvm::PointerType::getUnqual(ElemTy->getContext());
+      auto *NullPtr = Constant::getNullValue(PtrTy)...
[truncated]

@maksimsab
Copy link
Contributor Author

Hi @jdoerfert @jhuber6 !

We are adding offload wrapping for SYCL in this PR. Do you have any suggestions for whom I should request for a code review?

Copy link
Contributor

@jhuber6 jhuber6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a separate directory for the offload wrapper?

// SYCL-NEXT: @.sycl_offloading.0.info = internal local_unnamed_addr constant [2 x i64] [i64 ptrtoint (ptr @.sycl_offloading.0.data to i64), i64 0], section ".tgtimg", align 16
// SYCL-NEXT: @llvm.used = appending global [1 x ptr] [ptr @.sycl_offloading.0.info], section "llvm.metadata"
// SYCL-NEXT: @.sycl_offloading.device_images = internal unnamed_addr constant [1 x %__sycl.tgt_device_image] [%__sycl.tgt_device_image { i16 3, i8 8, i8 0, ptr @.sycl_offloading.target.0, ptr @.sycl_offloading.opts.compile.0, ptr @.sycl_offloading.opts.link.0, ptr @.sycl_offloading.0.data, ptr @.sycl_offloading.0.data, ptr null, ptr null, ptr null, ptr null }]
// SYCL-NEXT: @.sycl_offloading.descriptor = internal constant %__sycl.tgt_bin_desc { i16 1, i16 1, ptr @.sycl_offloading.device_images, ptr null, ptr null }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just copied from OpenMP?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RUN lines are copied from OpenMP and adjusted to SYCL case. Output's checks are copied from the clang-linker-wrapper's output.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I'm just wondering why everything's called tgt_bin_desc and similar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially, we customized old clang-offload-wrapper and then we extracted SYCL logic in a separate place. That is how these names stuck with us.

@maksimsab
Copy link
Contributor Author

@jhuber6
I had a choice to put it the common OffloadWrapper.{h,cpp} area. But SYCL's offload wrapping looks very different from others and I decided not to mix them together. Hence, I was choosing between llvm/Frontend/Offloading/SYCLOffloadWrapper.h and llvm/Frontend/SYCL/OffloadWrapper.h. If putting it together in llvm/Frontend/Offloading/OffloadWrapper.h looks better to you then I don't object to that.

@jhuber6
Copy link
Contributor

jhuber6 commented Jul 14, 2025

@jhuber6 I had a choice to put it the common OffloadWrapper.{h,cpp} area. But SYCL's offload wrapping looks very different from others and I decided not to mix them together. Hence, I was choosing between llvm/Frontend/Offloading/SYCLOffloadWrapper.h and llvm/Frontend/SYCL/OffloadWrapper.h. If putting it together in llvm/Frontend/Offloading/OffloadWrapper.h looks better to you then I don't object to that.

OpenMP and HIP don't look similar either but I figured it was easier to just put them all together because they're basically exporting the same interface.

@maksimsab
Copy link
Contributor Author

I've merged SYCL wrapper with all others.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this stuff not common with the other targets?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see a divergence with offloading::getOffloadEntryArray. Other targets don't set char *SymbolName for some reason.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate a bit more? Other targets definitely set the symbol name when they call emitOffloadingEntry. We shouldn't be duplicating functionality here. Is there something preventing you from just updating the other function with whatever you need?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Entries could be emitted by emitOffloadEntry. Then I should create an array that refers to these entries.
I don't see the appropriate other function that does that. The most closest one is offloading::getOffloadEntryArray but this function creates an empty array if I get it right. Therefore, I would need to modify that function in a strange way.
If you have other opinion regarding getOffloadEntryArray please let me know.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getOffloadEntryArray definitely does not create an empty array. It creates a pair of symbols that will let you iterate the offloading entries emitted in each TU once the linker combines them into a single section.

Copy link
Contributor

@jhuber6 jhuber6 Sep 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, our compilation flow consists of steps:
Link all modules together in Module LM.
Then split LM into several parts SM_1, ..., SM_N.

Could you explain this a bit further? Trivially you could just put some metadata in the offloading entries and skip over them as-needed in your wrapping code. I fail to see how that's a blocker, though it might be slightly annoying if you split them out later. I don't have the full view of how you choose to do your splitting. Another option, and one I do for relocatable links, is to just rename the section.

Also, the mentioned Mac OS case concerns me.

There is a way to do it in Mac OS, I just haven't gotten around to writing it, there's no incentive since GPU offloading on Mac-OS is effectively dead as far as I'm aware.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's consider the following example.
One TU has 2 kernels (A, B) and the user requests a split by kernels.
In unified approach we would have the output from the Codegen like the following:

kernel A() {...}
kernel B() {...}

@entry_A = %tgt_offload_entry {...} ; info about kernel A
@entry_B = %tgt_offload_entry {...} ; info about kernel B. 

Then we have to split these kernels using splitModuleTransitiveFromEntryPoints added in #131347. In order to correctly move these entries, we have to add such SYCL specific handling functionality in splitModuleTransitiveFromEntryPoints, which is very discouraged according to my understanding. That is only my concern about that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand the SYCL workflow unfortunately. If we are splitting the module (serveral IR blobs which may or may not contain the requested function) we could possibly just try each entry for each image. It's N^2 technically but I feel like these lists will be small enough that it won't blow up. I.e.

for image in images:
  for entry in entries:
    if (entry.name in image.symbols)
      register(entry, image);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Joseph, just for your information. We are currently experimenting/discussing internally on how to better address your feedback. We'll get back to you...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jhuber6, we replaced duplicated code with usage of emitOffloadingEntry and updated name and description of addOffloadEntriesToModule (now it is initOffloadEntriesPerImage) to avoid confusion with getOffloadEntryArray, because initially similar names created perception that the functions are doing the same, but in reality, they are different.

We need to initialize offload entries per image, because, unlike other targets, each SYCL device image contains its own set of symbols, which may contain different symbols than other images. The reason is per-kernel device code split, which solves the following problems:

  1. SYCL 2020 spec requirement: “An implementation may not raise a compile time diagnostic or a run time exception merely due to speculative compilation of a kernel for a device when the application does not actually submit the kernel to that device.” To satisfy this requirement, we split kernels into different images based on the features required.
  2. Compilation overhead. There are use cases, when module contains 1000s of kernels. Providing only specific symbols in each of the images allows to compile/run 1 specific kernel/TU, so no need to JIT compile all 1000s.

Does that address all your concerns for this PR?

Copy link
Contributor

@jhuber6 jhuber6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this and removing at least some of the code duplication. So, let me make sure I'm understanding this correctly. The other offloading targets have a big array of entries that apply to every image embedded in the device. The clang compiler emits these while it's compiling each TU itself.

SYCL on the other hand seems to do this all at the end and generates these arrays as-needed. Images are only needed for registration, so I'm still wondering if there isn't some way to just indicate which entries are to be used for which kernels. If I were doing this on the other targets it would just be a symbol lookup.

Could we at least put these in the same section as the other target's offloading entries? I'm also not entirely sure what the heuristic is for determining which symbols go where.

llvm::StringRef Suffix = "",
bool EmitSurfacesAndTextures = true);

struct SYCLWrappingOptions {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the members description this structure holds options for the runtime compiler (aka JIT compiler). I think it's better to reflect in the structure name.

Suggested change
struct SYCLWrappingOptions {
struct SYCLJITOptions {

Or maybe even just "JITOptions". These are specific to the device code format rather than a programming model. If we use some intermediate representation (e.g. SPIR-V), which just be compiled to the executable format, these options are useful to configure the runtime compilation step.
@sarnex, does OpenMP offload to SPIR-V compiler allows setting JIT compiler options? If so, I think this structure should have a neutral name like "JITOptions" and re-used by different offloading modes.

Copy link
Member

@sarnex sarnex Sep 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's supported and the generated device ELF contains it but I didn't hook it up to anything from the driver yet, see here.

Now Offload Wrapper handles the binary blob which contains sycl images.
Also, the issue with duplicated entry names strings is resolved.
SYCLWrappingOptions are renamed to SYCLJITOptions.
Copy link
Contributor

@YuriPlyakhin YuriPlyakhin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, also fix CI, if it is caused by your changes.

Rollback unittest to resolve CI.
Clean leftovers and unify names.
Copy link
Contributor

@jhuber6 jhuber6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is generally fine from my side, I'll leave it to the Intel people to understand the specifics.

ImgInfoVar->setAlignment(
MaybeAlign(M.getDataLayout().getTypeStoreSize(IntPtrTy) * 2u));
ImgInfoVar->setUnnamedAddr(GlobalValue::UnnamedAddr::Local);
ImgInfoVar->setSection(".tgtimg");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Significance of this section?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, this is unnecessary. I will take one more look for other useless pieces like that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this whole function unnecessary. So I removed that.

@sarnex
Copy link
Member

sarnex commented Sep 26, 2025

@maksimsab Can you ping me if this is really ready for review? Thanks

Copy link
Member

@sarnex sarnex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, all my comments can be addressed in a separate pr or as part of the next iteration of this change. if you are ready otherwise we can merge this before replying to any of my comments

Comment on lines +59 to +60
// Target/compiler specific options that are suggested to use to "compile"
// program at runtime.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit that can be addressed in follow-up pr, this comment might be clearer, these options arent really a suggestion, they must be passed to the gpu driver

Suggested change
// Target/compiler specific options that are suggested to use to "compile"
// program at runtime.
// Target/compiler specific options that are passed to the device compiler
// at runtime.

/// \param Options Data that needs to be encoded for the later use in a runtime.
LLVM_ABI llvm::Error
wrapSYCLBinaries(llvm::Module &M, llvm::ArrayRef<char> Buffer,
SYCLJITOptions Options = SYCLJITOptions());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit that can be addressed in follow-up: are there any cases where a caller wouldnt want to pass SYCLJitOptions? AOT mode?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, AOT mode is a case when it is too late to pass any options.

}

/// SYCLWrapper helper class that creates all LLVM IRs wrapping given images.
struct SYCLWrapper {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
struct SYCLWrapper {
class SYCLWrapper {

/// // null-terminated string representation of the device's target
/// // architecture
/// const char *Arch;
/// // a null-terminated string; target- and compiler-specific options
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same nit about the suggested descirption here, if we change the comment lets change both places

Comment on lines +702 to +703
/// const char *PropertiesBegin;
/// const char *PropertiesEnd;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont really understand why these are const char* but i think we are planning to rework these anyway so i guess it doesnt matter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I needed some "pointer type" here for now either const char* or const void*. Later, it would be changed to a pointer to appropriate structure.

// Note: Intel DPC++ compiler had 2 versions of this structure
// and clang++ has a third different structure. To avoid ABI incompatibility
// between generated device images the Version here starts from 3.
constexpr uint16_t DeviceImageStructVersion = 3;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason we use uint16 for this? is it what we use in our fork?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only can guess that the author was trying to save some space.

initOffloadEntriesPerImage(OB.getString("symbols"), OffloadKindTag);
Constant *WrappedBinary = ConstantStruct::get(
SyclDeviceImageTy, Version, OffloadKindConstant, ImageKindConstant,
TripleConstant, CompileOptions, LinkOptions, Binary.first,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the first and second usages are kinda unclear here, if they all follow some rule could we add a comment, and if not, could we add an inline comment for each one explaining what it is?

Comment on lines +948 to +951
/// extern const char *CompileOptions = "...";
/// ...
/// __attribute__((visibility("hidden")))
/// extern const char *LinkOptions = "...";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here these look like these is one set of compile/link opts for all images, but thats not necessarily true, for example we allow different device optimization options per-TU (as in a.cpp with O0 and b.cpp with O3). we can deal with this later though

GlobalVariable *Desc = W.createFatbinDesc(OffloadFiles);
if (!Desc)
return createStringError(inconvertibleErrorCode(),
"No binary descriptors created.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe this is more useful to the user?

Suggested change
"No binary descriptors created.");
"Error creating binary descriptors.");

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, both options don't follow LLVM's guideline because we don't add any useful piece of information.

/// \param Flags Flags associated with the entry.
/// \param Data Extra data storage associated with the entry.
/// \param SectionName The section this entry will be placed at.
/// \param AuxAddr An extra pointer if needed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we explain what this means more? when do we need it

@maksimsab
Copy link
Contributor Author

@sarnex Hi. It is ready.

@sarnex
Copy link
Member

sarnex commented Sep 26, 2025

Once CI passes I will merge

@sarnex sarnex enabled auto-merge (squash) September 26, 2025 16:34
@sarnex sarnex merged commit 65d730b into llvm:main Sep 26, 2025
9 checks passed
mahesh-attarde pushed a commit to mahesh-attarde/llvm-project that referenced this pull request Oct 3, 2025
This patch adds an Offload Wrapper for the SYCL kind. This is an
essential step for SYCL offloading and the compilation flow. The usage
of offload wrapping is added to the clang-linker-wrapper tool.

Modifications:
Implemented `bundleSYCL()` function to handle SYCL image bundling.
Implemented `wrapSYCLBinaries()` function that is invoked from
clang-linker-wrapper.
SYCL Offload Wrapping uses specific data structures such as
`__sycl.tgt_device_image` and `__sycl.tgt_bin_desc`. Each SYCL image
maintains its own symbol table (unlike shared global tables in other
targets). Therefore, symbols are encoded explicitly during the offload
wrapping. Also, images refer to their own Offloading Entries arrays
unlike other targets.
The proposed `__sycl.tgt_device_image` uses Version 3 to differentiate
from images generated by Intel DPC++. The structure proposed in this
patch doesn't have fields deprecated in DPC++.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang Clang issues not falling into any other category llvm:binary-utilities
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants