[vLLM IR] Port activations to IR op

`SiluAndMul` and other activation function `CustomOp` subclasses should be ported over to vLLM IR. This should be done in three steps:
1. Replace forward_* methods in `SiluAndMul` with a call to new `vllm.ir.ops.silu_and_mul`.
2. The same for other activation functions
3. Convert `CustomOp` objects to `PluggableLayer`

An additional challenge is the `compile_native=True` behavior: inside the `fused_moe` torch custom op, `SiluAndMul.forward_native` is not visible to model-level compilation, so we apply another `torch.compile` decorator. To work with vLLM IR, we'll have to locally disable torch wrapping (`vllm.ir.enable_torch_wrap(False)`), and only in the MoE case. So we should default `compile_native=False` and only set it to True for MoE. Moving forward, we will enable automatic compilation of all IR native implementations by default, but that requires more design & discussion: #38744

1 is high priority, 2 is slightly less so. 3 requires the above compilation fix. 

Also, once all OOT platforms migrate these ops to vLLM IR, we can remove the PluggableLayer system completely.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[vLLM IR] Port activations to IR op #38733

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[vLLM IR] Port activations to IR op #38733

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions