Skip to content

Conversation

@juliusikkala
Copy link
Contributor

Direct-to-LLVM IR target

TL;DR: This PR implements a new LLVM IR target which does not go through C++ as CPU targets have previously done. It translates Slang IR directly into LLVM IR.

This target is documented in docs/llvm-target.md, so I am not repeating the details here.

Benefits from the user perspective:

  • Potentially faster compile times (by skipping the extra step of C++)
  • Support for arbitrary type layouts (e.g. scalar layout & std430)
  • Better debug info, allowing native-feeling debugging of Slang code
  • Potential performance improvements through ability to communicate Slang semantics as LLVM attributes (e.g. noalias for out/inout params)

Benefits from a compiler developer perspective:

  • No reliance on a downstream compiler being available on the system (like Clang, G++ or MSVC)
  • Slang-on-CPU semantics are no longer bound by what is expressible in C++
  • Portable vectorization & FP16 support via LLVM IR intrinsics
  • Allows us to take advantage of various LLVM passes (like DeSPMD)
  • The general vibe of Slang no longer being a "mere" transpiler on CPU

Brief code overview

The LLVM target is split in two parts: Emitter (source/slang/slang-emit-llvm.cpp) and Builder (source/slang-llvm/slang-llvm-builder.cpp).

This split is done because we practically cannot access both Slang IR and LLVM IR in the same source file: LLVM is fully optional in Slang, so slang-compiler.dll cannot link to it and therefore cannot work with LLVM APIs such as their IRBuilder. Conversely, slang-llvm.dll should not link to slang-compiler.dll in order to avoid cyclical linking and issues with debug/release build compatibility on Windows.

Emitter

The Emitter side works similarly to the other slang-emit-*.cpp files. It handles Slang IR and constructs LLVM IR using the Builder.

The main class LLVMEmitter handles traversing Slang IR and emitting instructions. The central function here is emitLLVMInstruction().

LLVMTypeTranslator handles types and their layout rules. getType() returns the normal LLVM types, and getDebugType() gets debug versions with far more metadata.

Builder

The Builder side exposes a COM interface, ILLVMBuilder, which can be used to construct LLVM IR. It does a similar job as llvm::IRBuilder, but is streamlined to make the interface smaller. It also handles code generation. It gives out LLVM instructions and types to the Emitter as opaque pointers.

Internally, the builder uses LLVM's IRBuilder and DIBuilder interfaces. An alternate approach would've been to just generate the LLVM IR in text form by ourselves; however, that is a brittle approach when updating LLVM versions. By using LLVM's APIs, we're much more likely to get compile errors when LLVM IR changes with some update, making this approach more maintainable.

Architecturally, the Builder is quite straightforward, most functions just emit one LLVM instruction, although they often smooth over corner cases and inconsistencies in LLVM's API. The more interesting functions are the ones doing codegen:

  • emitInlineIRFunction()
  • emitComputeEntryPointWorkGroup()
  • emitComputeEntryPointDispatcher()
  • generateAssembly()
  • generateObjectCode()
  • generateJITLibrary()

Meta discussion

I have worked on this PR for a long while now, but I'm not particularly attached to any architectural choices here. I'm open for any suggestions, I'll rather fix any worries than rush this PR. I'm fine with large refactors.

I do realize that this is a massive PR and a lot to chew on for potential reviewers. Let me know if you'd like me to rather split this up into separate PRs, e.g. as follows:

  1. Builder addition
  2. Emitter addition
  3. LLVM intrinsics in *.meta.slang
  4. test enablement

I chose to begin with this big all-in-one PR because it's testable as a whole, whereas partial implementations would not be able to run tests.

Future work & plans

This section is "optional" reading, it doesn't affect this PR directly.

These are items that are left unimplemented in this PR to limit the scope. Some of these are long-term "wishlist" features, which become much more attainable with the direct-to-LLVM emitter.

These are in a rough order of importance to me. I'm planning to work on some of this stuff myself. If someone else wants to do any of the things listed below, that's also cool by me, I can review and provide help with that.

Texture and acceleration structure support

The problem with these types is that there's no "objective" or even "de-facto" memory layout for them on CPU:

  • Textures: Z-order curve or not? NPOT textures? Mipmaps?
  • TLAS: Embree? Some BVH from a physics library the user has already integrated in their game?

Especially the latter is worrying to me, since one known use-case for the CPU backends of Slang is particle simulation (#8244), where one could want to do collision testing with the scene using an AS. It would be good to allow users to take advantage of the acceleration structures of their physics engines.

I think these features could be implemented by lowering their types as opaque pointers, and forward declaring all sampling / tracing functions when they are used. It is then up to the user to link proper implementations for the operations they use. slang-rt could perhaps provide some default implementations.

Binding generator

Using arbitrary C libraries from the LLVM target requires generating some kind of Slang bindings for the libraries. Getting the correct calling convention for passing structure parameters is challenging to do portably, and effectively requires interaction with Clang.

I've already made a binding generator for the C++-based CPU target, but its dependency on the Python bindings of Clang is limiting and I'll probably rewrite this in C++.

I'm planning to work on this in my own repo first. I may propose it for inclusion in tools/ or extras/ if it is deemed to be of general interest.

Pointer size refactor

Just get rid of SLANG_PTR_IS_32 & SLANG_PTR_IS_64 defining the target's pointer size all around Slang. When the target is LLVM, we should check the pointer size based on the target triple.

DeSPMD integration

This is an LLVM IR pass that turns a function into a thread group / workgroup, taking barriers and subgroup operations properly into account. It has been discussed in #8244 and would help address that issue.

I have been privately given access to it, so I can begin working on this quite soon, but I cannot release code until related ongoing research has been published.

LLVM layout

It would be great if the "LLVM type layout" really used the DataLayout of LLVM's target machine to determine alignments. This may require quite a bit refactoring all around Slang to work, so I didn't tackle this one yet. It could also feel unpredictable to users that the vector layout can depend on the platform, so I'm not sure whether this is actually desirable. The layout in this PR plays it safe; things may be excessively aligned for some platforms.

Existing C++-specific features

class and COM interfaces are not yet supported on the LLVM targets.

GPU targets (OpenCL SPIR-V, DXIL)

This new LLVM target could provide a path to address #2533. Earlier, I've taken a brief stab at modifying the existing SPIR-V emitter to allow for OpenCL support, but there are major differences making it more effort than I was willing to expend at that time.

LLVM has targets for OpenCL SPIR-V and DXIL. I'm not yet sure of all the things that would be necessary for supporting these, but it would at least require mapping Slang resource types to these targets somehow and handling pointer address spaces correctly.

Since there was an exising legalization pass for this, I just use that to do
this transformation.
@juliusikkala juliusikkala requested a review from a team as a code owner November 9, 2025 22:53
@juliusikkala juliusikkala added pr: new feature pr: non-breaking PRs without breaking changes labels Nov 9, 2025
@tangent-vector
Copy link
Contributor

@juliusikkala I believe @csyonghe has been your main point of contact while you've pursued this work, and he's currently away on leave. If you would like this to be reviewed soon, I will plan to take on the review this week (it will take me a while to get through it...), but if you would rather wait for him to be available to weigh in, that is also okay.

@tangent-vector
Copy link
Contributor

Okay, just from reading the PR description I'm excited about reviewing this. I appreciate the amount of detail you've gone into in providing context and motivation to help a reviewer understand what they are diving into. The overall architecture you are describing sounds reasonable, but I haven't looked at the details in the code yet.

@tangent-vector
Copy link
Contributor

tangent-vector commented Nov 17, 2025

Notes on some of your future-work items:

Type layout, pointer sizes, etc.

Yes, we should try to fix all of that. I agree that the integration work could get a little messy (the code in slang-type-layout.cpp and slang-parameter-binding.cpp would need a way to talk to an interface implemented on the LLVM side of the fence), but it seems worth it to not maintain an ad hoc reimplementation of the actual layout rules of the target.

In principle it seems like such an approach would also make it easier to link some clang code into slang-llvm.dll and thereby also support computing layouts that are compatible with a target's C/C++ ABI. Do you have any insight into whether that would be feasible?

(Oh yeah, I forgot that we already link clang into slang-llvm.dll so that we can use it to compile our generated C++, when using that back-end option...)

Texture/Resource Types

For the sake of anybody reading along: there's a big difference between the "resource" types that can be reasonably implemented on CPU as just a pointer to a buffer (ConstantBuffer<T>, RWStructuredBuffer<T>, etc.) and those that cannot, because they may involve texel format conversion (possibly even implicitly) including block compression, and/or typically make use of non-trivial storage layouts (e.g., storing data swizzled in z-order, as tiles, padded to specific row alignment, etc.). This bit of future work is only about the latter.

Yeah, there's no once-size-fits-all solution. The original choices in the existing CPU back-end (for better or worse) were:

  • By default, implement a Texture2D, etc., as a pointer to a simple COM interface, and then translate sampling operations into calls into that interface.
  • Compile the generated C++ code against a "prelude" file that a user could, in principle, customize or replace to slot in their own implementation of the relevant types/operations.

I was never entirely happy with either of those choices, and in particular the whole "prelude" feature of the Slang compiler has been a thorn in my side ever since and I'd like to see it go.

The basic idea of having the Slang core module declare the texture-sampling operations for CPU targets but intentionally not implement them seems like a good overall approach, since it leaves developers with a lot of flexibility. So long as we are willing to bake in the assumption that a Texture2D on CPU is always just one pointer (and anybody who needs more just eats the cost of an indirection to their larger data structure), that seems reasonable. The only frustrating bit that remains in that case is that of memory lifetime management.

I know that the support for DescriptorHandle<T> for bindless resources on GPUs makes use of a core-module function that applications can define in their own code via link-time-specialization (to support bindless on targets that don't have a clean/orthogonal native implementation, but can support it with application-specific logic). It is possible that the same mechanism could apply to the case of CPU textures/resources.

DeSPMD

When it comes to DeSPMD or any comparable pass for allowing GPU-style SPMD kernel code to be run efficiently on CPU-style SIMD architectures: this would be great to have and would address one of the must frustrating long-standing gaps in Slang's story as a highly portable kernel language. It is perhaps understandable why the implementation effort so far has focused more on the GPU targets than CPU, but @csyonghe and I tried to lay the groundwork early on so that a motivated contributor could come along and adopt the CPU targets.

Given that you've had your hands in all the relevant code, you already know all this, but I will state the following for the sake of anybody following along: in practice, the Slang compiler conceptually has two very different ways of targeting CPUs:

  • We can treat a CPU as another kind of parallel computing device for executing kernels, akin to a GPU. Under this model, we can compile [shader("compute")] entry points to CPU (and there's in principle no reason somebody couldn't try to get other stages, such as ray-tracing, working... it just requires a motivated contributor). A compiled CPU kernel entry point is a function that runs the kernel for a full workgroup/threadgroup/block on a single CPU thread (and hopefully the working set of that group/block maps nicely to the L2 cache); executing an entire grid/dispatch across multiple threads is left up to runtime systems (it is beyond the scope of what the Slang compiler tries to address).

  • We can treat Slang as an ordinary systems/application programming language for writing (primarily) scalar CPU code. Under this model there is no specific notion of an "entry point," and instead it is natural to think of, e.g., a C++ application running on CPU wanting to directly call a Slang function, or vice versa. In this case, there is no implicit SPMD execution, and the Slang language has some additional features (such as class types and [COM] interfaces) that are more suitable to the CPU/host programming domain.

Integrating DeSPMD would be great for the CPU-as-device use case, but I would like to make sure that we don't bind tightly to the assumption that such a pass is always being used, so that we can leverage this LLVM back-end work across both the CPU-as-device and Slang-as-ordinary-CPU-language modes.

Binding Generation

Yes please! Binding generation for Slang has been on my list of pet projects I'd like to get to but never seem to have time for for years now.

I see the binding-generation problem as going in two directions:

  • First, we have the case of Slang code consuming APIs created for/in other languages. A clang-based tool for scraping declarations from C headers and translating it into Slang source code including declarations with all the necessary attributes to make them link correctly would be great to have, and I think it would be good to host under the shader-slang org (and probably even as part of the slang-llvm project, given the dependencies). There are all kinds of bells and whistles that can be added to tools like that, but just having something would be a big step in the right direction.

  • Second, we have the case of code in non-Slang languages consuming APIs authored in Slang. The SlangPy project already does quite a bit of the bridging in the case of Python, so the main other language that I'd consider it a priority to build a bridge to is C++ (Rust would be nice too, but the semantic divide is wider there, so it would be tricky). I think it would be very do-able to add a mode/flag to slangc so that when compiling, say, foo.slang it also emits a foo.slang.h that includes a best-effort C++ header matching what was declared in the .slang file. That emission should be based off of the Slang AST (not the IR!), and would need to skip over any parts of a module's API that we cannot automatically translate. Slang functions/methods could have their C++ equivalent declared nested in the appropriate namespace/type, and then have an inline definition that declares and then invokes a corresponding extern C function using the actual mangled symbol name (all our mangled names should also be usable as identifiers in C/C++).

The first of those tools would of course be great for exposing the wealth of existing C libraries to Slang, but the combination of both kinds of binding generation should, in principle, also make it easier for developers to build and maintain hybrid Slang/C++ codebases, by automating generation of bindings at build time with a flow like:

  • Process cpp-stuff.h to generate cpp-stuff.h.slang

  • Compile slang-stuff.slang code (allowing it to import cpp-stuff.h.slang) and generate slang-stuff.slang.h, along with the output .slang-modules and/or .os

  • Compile the C++ code (which can now #include "slang-stuff.slang.h") to generate all the necessary .os

  • Link your hybrid C++/Slang binary

  • World domination

@juliusikkala
Copy link
Contributor Author

@tangent-vector there's no major rush with the PR, but I would very much appreciate your review! 😄 I have already began working on and designing some of that "future work" on top of this branch, and having this initial implementation finalized and possibly merged some time in the near future would be very useful to provide a solid foundation.

In particular, the binding generator is partly done already and I've done some more pondering on the texture & acceleration structure support. Operations on them could possibly also be vectorized even if they use user-specified external functions; the LLVM call attribute vector-function-abi-variant allows one to provide vectorized alternatives of a function to the autovectorizer, such as those provided by Embree (rtcIntersect4/8/16). The presence of vectorized functions can be made optional at link-time by providing weakly-linked fallback implementations that sequentially call the scalar version instead. But I digress.

All of the discussions I've had with @csyonghe regarding this feature have occurred on Discord, on the slang-dev channel and committers channel, in case you feel like checking what we have discussed. Thanks again!

@juliusikkala
Copy link
Contributor Author

Thanks for the comments on future work! Re:

Type layout, pointer sizes, etc.

I've had a brief look through the Clang API (for the binding generator), but it hasn't been rosy. I'm not yet sure how much interaction with Clang is reasonable in this context without losing one's sanity and having to emit temporary C/C++ code to feed to Clang anyway... The least awful option is probably to construct the structs in Clang AST, but it's probably still a huge effort.

That said, my understanding is that LLVM's own struct layout already matches Clang's, so if we simply want to query struct layout details, we can probably do so without having to bother with Clang. LLVM's API is much better documented and has felt easier to use.

Texture / resource types

Yeah, I also worry about the lifetime management part, especially with the RayQuery object, which needs to also be an opaque pointer but is allocated during runtime inside the shader. Perhaps deallocation could occur at thread group level, where the "runtime" deallocates everything at once when the thread group finishes.

DeSPMD

I agree that there must be no reliance on such passes for the scalar/host mode.

This PR already supports the kernel mode by simply generating a for-loop over the work items in LLVMBuilder::emitComputeEntryPointWorkGroup. This matches what the current C++ target does, but such a simplistic approach makes it impossible to support barriers correctly.

This loop is only ever generated for the llvm-shader-obj and llvm-shader-ir targets; no such loops are generated for the corresponding host targets.

DeSPMD should, in my understanding, be a replacement for that naive loop in such a way that all kinds of barriers (workgroup and subgroup/warp) are taken into account correctly. This is apparently very non-trivial to do correctly and efficiently.

Binding Generation

The tool I've been working on currently only focuses on the first case. Some more careful naming for the tool is necessary, because there currently are likely many times more people wanting a tool for the second case instead.

(all for the sake of world domination)

@juliusikkala
Copy link
Contributor Author

Mac CI fail is most likely this: actions/runner-images#13341

@juliusikkala
Copy link
Contributor Author

The most recent fixes are due to porting my Slang-on-CPU utilities to this LLVM target. I found a couple more bugs that way. That project's tests now pass with the targets introduced in this PR, so I don't expect to make any more changes to this PR before receiving feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr: new feature pr: non-breaking PRs without breaking changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants