Skip to content

Conversation

zerbina
Copy link
Collaborator

@zerbina zerbina commented Apr 13, 2025

Add a pass that translates L3 code into LLVM IR, making it possible to compile source language programs and NimSkull program's compiled via skully to native executables via LLVM.

At the moment, the pass is biased towards code produced by skully.

Instead of using the LLVM API to create the LLVM IR and compile it into an executable, the pass simply emits textual LLVM IR, which has a number of benefits:

  • no bindings to the LLVM API are needed
  • phy doesn't depend on a dev version of LLVM; LLVM doesn't have to be built from source to compile phy
  • the textual IR is much more stable than the LLVM API

In the future, the pass could move to emit LLVM bitcode, which would speed up the pass and produce smaller artifacts.


To-Do

  • fix the upstream (i.e., NimSkull) bugs that make skully produce incorrect code
  • move the skully fixes into a separate PR
  • move the Ptr type introduction to a separate PR
  • split Conv into multiple operators (separate PR)
  • properly implement exception handling
  • implement Select support
  • split up runtime.c
  • test skully + LLVM pass in the CI
  • use the L25 IL as input
  • write a proper commit message

zerbina added 12 commits April 13, 2025 16:56
Introducing a dedicated pointer type for use in the earlier passes was
planned for some time now (it removes the need to pass the pointer
size to some of the passes) -- the LLVM pass more or less requires it.
Now that there's a dedicated pointer type, use it.
* translate NimSkull pointer types to the IL `Ptr` type
* emit conversions when treating globals or integer constants
  as pointers
Pointer types are no longer identical to unsigned integers, thus
requiring the same to-uint conversion signed integers do.
Converting between different pointers types is a no-op at the IL level.
Exceptions currently use the Itanium "zero cost" exception handling,
but this might be subject to change, should implementing the
personality function be too much of a hassle.
A conversion was always emitted, even if the types were already equal.
The target IL requires the Conv types to be different.
The length instead of the payload field was tested for a nil value. At
compile time, this shows up as a type error in passes that care about
proper typing.
None of the previous passes cared about the size of unions, but the
LLVM code generator does, so the size and alignment has to be set to
the correct values now.
@zerbina zerbina added the enhancement New feature or request label Apr 13, 2025
@zerbina
Copy link
Collaborator Author

zerbina commented Apr 13, 2025

Locally, I'm able to compile phy.nim into a working native executable, like so:

> skully phy/phy.nim phy.txt
> phy --source:L25 --target:llvm c phy.txt
> llvm-as -o phy.bc out.ll
> llc -filetype=native -o phy.o phy.bc
> clang phy.o runtime.c 

It's also possible to combine the LLVM steps by using (stock) clang, but this routinely resulted in clang crashing, at least on my system.

The above is not reproducible with a stock NimSkull yet, as there are multiple bugs in the NimSkull compiler where incorrectly typed MIR code is emitted. Since skully doesn't perform any type correction (nor should it), the incorrectly typed MIR leads to incorrectly typed L3 code, and since the LLVM IR is quite strongly typed (when compared to, e.g., the Phy VM bytecode), this presently results in llvm-as rejecting the input.


This PR will most likely stay open for quite some time, as there's no rush to finish it. It's also not clear whether it should be merged at all, or whether it's better for the LLVM pass to be implemented in a separate repository.

I mainly wanted to experiment with how easy/possible it is to implement a to-LLVM translation pass with the current compiler.

@zerbina zerbina mentioned this pull request May 19, 2025
zerbina added 8 commits May 21, 2025 16:00
There's a proper `Ptr` type now.
The L3 semantics changed considerably. Conversions have become simpler,
there are dedicated pointers types, and globals support storing complex
data.
This includes building and running an executable out of the code-
generator produced assembler code.
This also makes the LLVM produced by the code generator compile, as
there are no longer any duplicated interface names.
@zerbina
Copy link
Collaborator Author

zerbina commented May 21, 2025

I've merged the PR with upstream and also added a test for the code generator to the CI.

There's still one issue remaining in the NimSkull compiler, namely, that the pred and succ translation in mirgen produces incorrect MIR code, which shows up here as a type error from the LLVM assembler.

Separately, I've concluded that the IL Select was a misfeature. I'll replace it with something akin to a branch-with-table, which is much simpler to implement, not only for the LLVM code generator, but also the VM code generator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant