-
-
Notifications
You must be signed in to change notification settings - Fork 10.7k
Open
Labels
feature requestNew feature or requestNew feature or request
Description
π The feature, motivation and pitch
In order to further improve the performance of large models like Qwen3-VL-235B, fp4 plus expert parallel is a necessary feature.
Alternatives
No response
Additional context
The current fp4 checkpoint is quantized through llm-compressor.
The current quantization method class has assertion that blocks using --enable-expert-parallel.
AssertionError: Expert Parallelism / expert_map is currently not supported for CompressedTensorsW4A4MoeMethod.
Similar warning has been seen on NVFP4 method through modelopt.
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
tomasruizt
Metadata
Metadata
Assignees
Labels
feature requestNew feature or requestNew feature or request