[Feature]: Expert Parallelism with CompressedTensor FP4 or NVFP4

### 🚀 The feature, motivation and pitch

In order to further improve the performance of large models like Qwen3-VL-235B, fp4 plus expert parallel is a necessary feature.

### Alternatives

_No response_

### Additional context

The current fp4 checkpoint is quantized through llm-compressor. 

The current quantization method class has assertion that blocks using --enable-expert-parallel. 
`AssertionError: Expert Parallelism / expert_map is currently not supported for CompressedTensorsW4A4MoeMethod.`

Similar warning has been seen on NVFP4 method through modelopt. 

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feature]: Expert Parallelism with CompressedTensor FP4 or NVFP4 #27231

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Feature]: Expert Parallelism with CompressedTensor FP4 or NVFP4 #27231

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions