Representing labels-as-values and computed `goto`s in the RVSDG #518

alexrp · 2024-06-13T10:34:28Z

alexrp
Jun 13, 2024

Some context: #52 (and #217).

This is a GCC/Clang extension that is useful in language interpreters, emulators, etc. Back in the day, you could see speedups all the way up to 50% on the high end when using this feature (depending on lots of factors), but nowadays, 15-30% is more common as CPUs have gotten beefier and smarter. Still, that's enough of a performance gain that the feature remains in use ubiquitously in those domains. For these reasons, it seems worth figuring out how it can be supported in the RVSDG.

In LLVM land, the feature is modeled with blockaddress constants and indirectbr instructions. Helpfully, block addresses are defined very opaquely, so there is quite a lot of implementation flexibility. Also helpfully, indirectbr instructions must explicitly list all possible target blocks. Some background on the addition of these LLVM constructs.

It is possible to emulate this feature by assigning an ID to each target block, lowering blockaddress to the appropriate ID, and lowering indirectbr to a switch containing branches for each assigned ID. This undermines the performance benefits, but it does at least allow such code to compile. This is what LLVM used to do, and it still has a pass to do it for targets that require it. WebAssembly (which doesn't support irreducible control flow) uses that pass, for example.

The fact that the feature reduces to features that the RVSDG can already represent -- that is, a regular switch on a value to pick one of several statically-known branches -- makes me think that there's no particular reason why it shouldn't also be representable natively in the RVSDG.

At a very high level, I imagine the native representation might look something like:

A new structural node ("psi"? 🤷) that has characteristics of the lambda and gamma nodes.
- It contains a set of regions, like a gamma node.
  - These regions have arguments and results that work like the lambda node.
- Its first output represents itself, like for the lambda node.
- Its subsequent outputs represent the addresses of the contained regions (this models blockaddress).
A new simple node ("jump"?) that works similarly to the apply node (this models indirectbr).
- Its first input is the identity output of a psi node.
- Its second input is the region address value (i.e. ptr in goto *ptr).
- Its subsequent inputs are mapped to arguments of the target region.
- The results of the target region are mapped to its outputs.
A generalization of the phi node to arbitrary values, to break cycles that occur if code inside a psi region refers to that psi node's region address outputs.

Of course, the hard part would be teaching RVSDG construction to create these. 🙂

All this being said, I don't have the theoretical background to say anything with certainty here, so I would very much like to hear everyone's thoughts.

phate · 2024-06-14T04:32:42Z

phate
Jun 14, 2024
Maintainer

So I looked a little bit at the resources you provided. Thanks for them, I learned something.

After reading them, I agree with you that it is absolutely possible to support these instructions (in contrast to what I stated in these issues). Currently, the RVSDG creation process looks like this from a 10000 feet level:

Translate LLVM IR control flow graph to JLM control flow graph
Restructure JLM control flow graph
Translate restructured control flow graph to RVSDG

Now, we could easily support these constructs by:

Perform IndirectBr Expand PASS on LLVM IR control flow graph, effectively removing the unsupported instructions from the IR.
Continue creation process as mentioned above.

This would add support for these instructions without the need to expand our IR. However, it would also make us pay the performance penalty, but there might be a chance to eventually recover (some) performance by implementing predicate control flow recovery as mentioned in the above linked paper. I do not know whether this would recover the performance, but it is certainly worth a shot as it would allow us to have the cake and also eat it:

We pay no penalty in terms of IR implementation overhead.
We would still get the performance out (big IF)

I am sure that it is somehow possible to natively support these constructs in the RVSDG, but their nature clashes with the RVSDGs philosophy to have normalized control flow and explicit (data) dependencies. The idea of blockaddress is to take the address of a label. In a control flow graph setting, labels (and the following instructions) can be easily mapped to basic blocks (that is why it is called blockaddress), but this is not possible in an RVSDG. There exist no basic blocks in the RVSDG and therefore no final address to take. Instructions are not confined to basic blocks, but to regions (which have no 1-1 mapping to the basic blocks in the original CFG). The connection between the labels and the respective jumps would also not be explicitly modeled and would only be implicit. I really do not see how to squeeze this effectively into the RVSDG construction (which does not mean that it is not possible).

In contrast, I find the first described route way more appealing:

Normalize the control flow constructs
Try to map these constructs to efficient control flow (via explicit jumps, jump tables, god knows what) in the compiler back-end and leave it to the compiler to extract efficient control flow.

I would like to let the compiler make the decisions for me regarding efficient control flow generation, but such language constructs make this very hard. They already lock in a solution/path and then the compiler has to work very hard to lift this again to a higher abstraction level in order to be able to potentially make a different/better decision in terms of control flow generation than what the user already provided.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Representing labels-as-values and computed `goto`s in the RVSDG #518

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Representing labels-as-values and computed gotos in the RVSDG #518

Uh oh!

alexrp Jun 13, 2024

Replies: 1 comment

Uh oh!

phate Jun 14, 2024 Maintainer

Representing labels-as-values and computed `goto`s in the RVSDG #518

alexrp
Jun 13, 2024

phate
Jun 14, 2024
Maintainer