fix: restore MiniCPM inference after Granite Four changes #14850
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This commit fixes MiniCPM model inference that was broken by the Granite Four PR (#13550). The issue had two parts:
Missing LLM_KV_ATTENTION_LAYER_INDICES enum value that was removed, causing enum ordering to shift and breaking model metadata parsing
MiniCPM architecture uses llm_build_granite which was changed to use hparams.rope_finetuned instead of use_rope parameter, but MiniCPM models were not setting this flag correctly
Changes:
Fixes inference output from gibberish to correct model responses.
Tested with MiniCPM 0.5B model showing proper inference: Input: "你好"
Output: "你好,我是MiniCPM系列模型,由面壁智能和OpenBMB开源社区开发。详细信息请访问 https://github.com/OpenBMB/ [end of text]"
Make sure to read the contributing guidelines before submitting a PR