Skip to content

[vLLM IR] Port activations to IR op #38733

@ProExpertProg

Description

@ProExpertProg

SiluAndMul and other activation function CustomOp subclasses should be ported over to vLLM IR. This should be done in three steps:

  1. Replace forward_* methods in SiluAndMul with a call to new vllm.ir.ops.silu_and_mul.
  2. The same for other activation functions
  3. Convert CustomOp objects to PluggableLayer

An additional challenge is the compile_native=True behavior: inside the fused_moe torch custom op, SiluAndMul.forward_native is not visible to model-level compilation, so we apply another torch.compile decorator. To work with vLLM IR, we'll have to locally disable torch wrapping (vllm.ir.enable_torch_wrap(False)), and only in the MoE case. So we should default compile_native=False and only set it to True for MoE. Moving forward, we will enable automatic compilation of all IR native implementations by default, but that requires more design & discussion: #38744

1 is high priority, 2 is slightly less so. 3 requires the above compilation fix.

Also, once all OOT platforms migrate these ops to vLLM IR, we can remove the PluggableLayer system completely.

Metadata

Metadata

Assignees

Labels

vllm-irvLLM IR: intermediate representation and kernel registration

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions