-
Notifications
You must be signed in to change notification settings - Fork 12.7k
support GLM-4.5 MoE models #15026
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support GLM-4.5 MoE models #15026
Conversation
As I get things didn't go well with previous PR. |
initial PR commit add GGUF constants initial GLM-4.5 integration fix typo `LLM_ATCH_GLM4_MOE` --> `LLM_ARCH_GLM4_MOE` add glm4_moe tensor mapping add `attn_k_norm` and `attn_q_norm` tensors for GLM-4.5 more consistent organization more consistent organization (cont.) Merge branch 'ggml-org:master' into glm45 Merge branch 'ggml-org:master' into glm45
Alright, I think I've got most of the actual implementation done:
Next I need to implement the HF --> GGUF conversion and do some testing with the model before I'm ready for a full review. It would be helpful to get another pair of eyes on this, but I'm not sure who to ping. |
Just briefly I can note that you've copied the |
D'oh! |
I'm myself laser focused on just correctness of the other PR with some of my own changes and @CISC changes, seeing if I can confirm parity using MLX-LM implementation. If I'm successful, the result of that work can be used in this PR or the other PR, or for whoever wants to get the implementation ready, but otherwise I probably won't do reviewing work. |
- remove `ffn_norm` per CISC - re-organize some small things
Hey! I tried converting I tried checking out both
|
For now, conversion isn't implemented, see original comment. The PR 14939 has implemented conversion and there is more progress there. Not sure which one will get merged. |
Closed in favor of #14939. |
GLM-4.5 are two Mixture-of-Experts LLMs released by Zhipu / Z.ai. They are highly interesting for running locally due to their size and apparent performance thus far. If successful this PR would close #14921. For additional context, see #14939.
GLM-4.5 model info (
GLM4_MOE
)common info
GLM-4.5-Air
GLM-4.5
in 🤗 transformers
Glm4Moe
model can be implemented as subclasses ofDeepseekV3
components:Glm4MoeModel
Glm4MoeMLP
Glm4MoeTopkRouter
Glm4MoeRMSNorm
Glm4MoeDecoderLayer
Glm4MoePreTrainedModel
Glm4MoeForCausalLM
<-- this is what's in the config.json on HFGlm4MoeAttention
can be implemented as a subclass ofCohereAttention
andnn.Module
(looks pretty standard)ⓘ misc. notes
TODOs:
add GGUF constantsadd basic C++ codeadd case forload_hparams
add case forload_tensors
writellm_build_glm4_moe