-
-
Notifications
You must be signed in to change notification settings - Fork 15.1k
[vLLM IR] Port activations to IR op #38733
Description
SiluAndMul and other activation function CustomOp subclasses should be ported over to vLLM IR. This should be done in three steps:
- Replace forward_* methods in
SiluAndMulwith a call to newvllm.ir.ops.silu_and_mul. - The same for other activation functions
- Convert
CustomOpobjects toPluggableLayer
An additional challenge is the compile_native=True behavior: inside the fused_moe torch custom op, SiluAndMul.forward_native is not visible to model-level compilation, so we apply another torch.compile decorator. To work with vLLM IR, we'll have to locally disable torch wrapping (vllm.ir.enable_torch_wrap(False)), and only in the MoE case. So we should default compile_native=False and only set it to True for MoE. Moving forward, we will enable automatic compilation of all IR native implementations by default, but that requires more design & discussion: #38744
1 is high priority, 2 is slightly less so. 3 requires the above compilation fix.
Also, once all OOT platforms migrate these ops to vLLM IR, we can remove the PluggableLayer system completely.