-
Notifications
You must be signed in to change notification settings - Fork 379
LLVM IR target #8960
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
LLVM IR target #8960
Conversation
More likely than you think...
Since there was an exising legalization pass for this, I just use that to do this transformation.
|
@juliusikkala I believe @csyonghe has been your main point of contact while you've pursued this work, and he's currently away on leave. If you would like this to be reviewed soon, I will plan to take on the review this week (it will take me a while to get through it...), but if you would rather wait for him to be available to weigh in, that is also okay. |
|
Okay, just from reading the PR description I'm excited about reviewing this. I appreciate the amount of detail you've gone into in providing context and motivation to help a reviewer understand what they are diving into. The overall architecture you are describing sounds reasonable, but I haven't looked at the details in the code yet. |
|
Notes on some of your future-work items: Type layout, pointer sizes, etc.Yes, we should try to fix all of that. I agree that the integration work could get a little messy (the code in In principle it seems like such an approach would also make it easier to link some clang code into (Oh yeah, I forgot that we already link clang into Texture/Resource TypesFor the sake of anybody reading along: there's a big difference between the "resource" types that can be reasonably implemented on CPU as just a pointer to a buffer ( Yeah, there's no once-size-fits-all solution. The original choices in the existing CPU back-end (for better or worse) were:
I was never entirely happy with either of those choices, and in particular the whole "prelude" feature of the Slang compiler has been a thorn in my side ever since and I'd like to see it go. The basic idea of having the Slang core module declare the texture-sampling operations for CPU targets but intentionally not implement them seems like a good overall approach, since it leaves developers with a lot of flexibility. So long as we are willing to bake in the assumption that a I know that the support for DeSPMDWhen it comes to DeSPMD or any comparable pass for allowing GPU-style SPMD kernel code to be run efficiently on CPU-style SIMD architectures: this would be great to have and would address one of the must frustrating long-standing gaps in Slang's story as a highly portable kernel language. It is perhaps understandable why the implementation effort so far has focused more on the GPU targets than CPU, but @csyonghe and I tried to lay the groundwork early on so that a motivated contributor could come along and adopt the CPU targets. Given that you've had your hands in all the relevant code, you already know all this, but I will state the following for the sake of anybody following along: in practice, the Slang compiler conceptually has two very different ways of targeting CPUs:
Integrating DeSPMD would be great for the CPU-as-device use case, but I would like to make sure that we don't bind tightly to the assumption that such a pass is always being used, so that we can leverage this LLVM back-end work across both the CPU-as-device and Slang-as-ordinary-CPU-language modes. Binding GenerationYes please! Binding generation for Slang has been on my list of pet projects I'd like to get to but never seem to have time for for years now. I see the binding-generation problem as going in two directions:
The first of those tools would of course be great for exposing the wealth of existing C libraries to Slang, but the combination of both kinds of binding generation should, in principle, also make it easier for developers to build and maintain hybrid Slang/C++ codebases, by automating generation of bindings at build time with a flow like:
|
|
@tangent-vector there's no major rush with the PR, but I would very much appreciate your review! 😄 I have already began working on and designing some of that "future work" on top of this branch, and having this initial implementation finalized and possibly merged some time in the near future would be very useful to provide a solid foundation. In particular, the binding generator is partly done already and I've done some more pondering on the texture & acceleration structure support. Operations on them could possibly also be vectorized even if they use user-specified external functions; the LLVM call attribute All of the discussions I've had with @csyonghe regarding this feature have occurred on Discord, on the slang-dev channel and committers channel, in case you feel like checking what we have discussed. Thanks again! |
|
Thanks for the comments on future work! Re: Type layout, pointer sizes, etc.I've had a brief look through the Clang API (for the binding generator), but it hasn't been rosy. I'm not yet sure how much interaction with Clang is reasonable in this context without losing one's sanity and having to emit temporary C/C++ code to feed to Clang anyway... The least awful option is probably to construct the structs in Clang AST, but it's probably still a huge effort. That said, my understanding is that LLVM's own struct layout already matches Clang's, so if we simply want to query struct layout details, we can probably do so without having to bother with Clang. LLVM's API is much better documented and has felt easier to use. Texture / resource typesYeah, I also worry about the lifetime management part, especially with the RayQuery object, which needs to also be an opaque pointer but is allocated during runtime inside the shader. Perhaps deallocation could occur at thread group level, where the "runtime" deallocates everything at once when the thread group finishes. DeSPMDI agree that there must be no reliance on such passes for the scalar/host mode. This PR already supports the kernel mode by simply generating a for-loop over the work items in This loop is only ever generated for the DeSPMD should, in my understanding, be a replacement for that naive loop in such a way that all kinds of barriers (workgroup and subgroup/warp) are taken into account correctly. This is apparently very non-trivial to do correctly and efficiently. Binding GenerationThe tool I've been working on currently only focuses on the first case. Some more careful naming for the tool is necessary, because there currently are likely many times more people wanting a tool for the second case instead. (all for the sake of world domination) |
|
Mac CI fail is most likely this: actions/runner-images#13341 |
|
The most recent fixes are due to porting my Slang-on-CPU utilities to this LLVM target. I found a couple more bugs that way. That project's tests now pass with the targets introduced in this PR, so I don't expect to make any more changes to this PR before receiving feedback. |
Direct-to-LLVM IR target
TL;DR: This PR implements a new LLVM IR target which does not go through C++ as CPU targets have previously done. It translates Slang IR directly into LLVM IR.
This target is documented in
docs/llvm-target.md, so I am not repeating the details here.Benefits from the user perspective:
noaliasforout/inoutparams)Benefits from a compiler developer perspective:
Brief code overview
The LLVM target is split in two parts: Emitter (
source/slang/slang-emit-llvm.cpp) and Builder (source/slang-llvm/slang-llvm-builder.cpp).This split is done because we practically cannot access both Slang IR and LLVM IR in the same source file: LLVM is fully optional in Slang, so
slang-compiler.dllcannot link to it and therefore cannot work with LLVM APIs such as their IRBuilder. Conversely,slang-llvm.dllshould not link toslang-compiler.dllin order to avoid cyclical linking and issues with debug/release build compatibility on Windows.Emitter
The Emitter side works similarly to the other
slang-emit-*.cppfiles. It handles Slang IR and constructs LLVM IR using the Builder.The main class
LLVMEmitterhandles traversing Slang IR and emitting instructions. The central function here isemitLLVMInstruction().LLVMTypeTranslatorhandles types and their layout rules.getType()returns the normal LLVM types, andgetDebugType()gets debug versions with far more metadata.Builder
The Builder side exposes a COM interface,
ILLVMBuilder, which can be used to construct LLVM IR. It does a similar job asllvm::IRBuilder, but is streamlined to make the interface smaller. It also handles code generation. It gives out LLVM instructions and types to the Emitter as opaque pointers.Internally, the builder uses LLVM's IRBuilder and DIBuilder interfaces. An alternate approach would've been to just generate the LLVM IR in text form by ourselves; however, that is a brittle approach when updating LLVM versions. By using LLVM's APIs, we're much more likely to get compile errors when LLVM IR changes with some update, making this approach more maintainable.
Architecturally, the Builder is quite straightforward, most functions just emit one LLVM instruction, although they often smooth over corner cases and inconsistencies in LLVM's API. The more interesting functions are the ones doing codegen:
emitInlineIRFunction()emitComputeEntryPointWorkGroup()emitComputeEntryPointDispatcher()generateAssembly()generateObjectCode()generateJITLibrary()Meta discussion
I have worked on this PR for a long while now, but I'm not particularly attached to any architectural choices here. I'm open for any suggestions, I'll rather fix any worries than rush this PR. I'm fine with large refactors.
I do realize that this is a massive PR and a lot to chew on for potential reviewers. Let me know if you'd like me to rather split this up into separate PRs, e.g. as follows:
*.meta.slangI chose to begin with this big all-in-one PR because it's testable as a whole, whereas partial implementations would not be able to run tests.
Future work & plans
This section is "optional" reading, it doesn't affect this PR directly.
These are items that are left unimplemented in this PR to limit the scope. Some of these are long-term "wishlist" features, which become much more attainable with the direct-to-LLVM emitter.
These are in a rough order of importance to me. I'm planning to work on some of this stuff myself. If someone else wants to do any of the things listed below, that's also cool by me, I can review and provide help with that.
Texture and acceleration structure support
The problem with these types is that there's no "objective" or even "de-facto" memory layout for them on CPU:
Especially the latter is worrying to me, since one known use-case for the CPU backends of Slang is particle simulation (#8244), where one could want to do collision testing with the scene using an AS. It would be good to allow users to take advantage of the acceleration structures of their physics engines.
I think these features could be implemented by lowering their types as opaque pointers, and forward declaring all sampling / tracing functions when they are used. It is then up to the user to link proper implementations for the operations they use.
slang-rtcould perhaps provide some default implementations.Binding generator
Using arbitrary C libraries from the LLVM target requires generating some kind of Slang bindings for the libraries. Getting the correct calling convention for passing structure parameters is challenging to do portably, and effectively requires interaction with Clang.
I've already made a binding generator for the C++-based CPU target, but its dependency on the Python bindings of Clang is limiting and I'll probably rewrite this in C++.
I'm planning to work on this in my own repo first. I may propose it for inclusion in
tools/orextras/if it is deemed to be of general interest.Pointer size refactor
Just get rid of
SLANG_PTR_IS_32&SLANG_PTR_IS_64defining the target's pointer size all around Slang. When the target is LLVM, we should check the pointer size based on the target triple.DeSPMD integration
This is an LLVM IR pass that turns a function into a thread group / workgroup, taking barriers and subgroup operations properly into account. It has been discussed in #8244 and would help address that issue.
I have been privately given access to it, so I can begin working on this quite soon, but I cannot release code until related ongoing research has been published.
LLVM layout
It would be great if the "LLVM type layout" really used the DataLayout of LLVM's target machine to determine alignments. This may require quite a bit refactoring all around Slang to work, so I didn't tackle this one yet. It could also feel unpredictable to users that the vector layout can depend on the platform, so I'm not sure whether this is actually desirable. The layout in this PR plays it safe; things may be excessively aligned for some platforms.
Existing C++-specific features
classand COM interfaces are not yet supported on the LLVM targets.GPU targets (OpenCL SPIR-V, DXIL)
This new LLVM target could provide a path to address #2533. Earlier, I've taken a brief stab at modifying the existing SPIR-V emitter to allow for OpenCL support, but there are major differences making it more effort than I was willing to expend at that time.
LLVM has targets for OpenCL SPIR-V and DXIL. I'm not yet sure of all the things that would be necessary for supporting these, but it would at least require mapping Slang resource types to these targets somehow and handling pointer address spaces correctly.