Skip to content

[Feature]: Expert Parallelism with CompressedTensor FP4 or NVFP4Β #27231

@Victor49152

Description

@Victor49152

πŸš€ The feature, motivation and pitch

In order to further improve the performance of large models like Qwen3-VL-235B, fp4 plus expert parallel is a necessary feature.

Alternatives

No response

Additional context

The current fp4 checkpoint is quantized through llm-compressor.

The current quantization method class has assertion that blocks using --enable-expert-parallel.
AssertionError: Expert Parallelism / expert_map is currently not supported for CompressedTensorsW4A4MoeMethod.

Similar warning has been seen on NVFP4 method through modelopt.

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions