Skip to content

Conversation

@nazneenn
Copy link

In the current v1.22.0 release and in the future branch, deepseek distill model fails because it incorrectly falls into the DeepSeek if condition and triggers expert parallelism, which results in the error: “Value error, Number of experts in the model must be greater than 0 when expert parallelism is enabled.”
The proposed fix should remove this dependency for DeepSeek distill models and ensures that expert parallelism is enabled only for DeepSeek models.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant