Skip to content

Commit 19594e5

Browse files
committed
fix: restore MiniCPM inference after Granite Four changes
This commit fixes MiniCPM model inference that was broken by the Granite Four PR (#13550). The issue had two parts: 1. Missing LLM_KV_ATTENTION_LAYER_INDICES enum value that was removed, causing enum ordering to shift and breaking model metadata parsing 2. MiniCPM architecture uses llm_build_granite which was changed to use hparams.rope_finetuned instead of use_rope parameter, but MiniCPM models were not setting this flag correctly Changes: - Restore LLM_KV_ATTENTION_LAYER_INDICES enum and string mapping - Set hparams.rope_finetuned = true for MiniCPM architecture Fixes inference output from gibberish to correct model responses. Tested with MiniCPM 0.5B model showing proper inference: Input: "你好" Output: "你好,我是MiniCPM系列模型,由面壁智能和OpenBMB开源社区开发。详细信息请访问 https://github.com/OpenBMB/ [end of text]"
1 parent 4ec6291 commit 19594e5

File tree

3 files changed

+5
-0
lines changed

3 files changed

+5
-0
lines changed

src/llama-arch.cpp

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -160,6 +160,7 @@ static const std::map<llm_kv, const char *> LLM_KV_NAMES = {
160160
{ LLM_KV_ATTENTION_SCALE, "%s.attention.scale" },
161161
{ LLM_KV_ATTENTION_KEY_LENGTH_MLA, "%s.attention.key_length_mla" },
162162
{ LLM_KV_ATTENTION_VALUE_LENGTH_MLA, "%s.attention.value_length_mla" },
163+
{ LLM_KV_ATTENTION_LAYER_INDICES, "%s.attention.layer_indices" },
163164

164165
{ LLM_KV_ROPE_DIMENSION_COUNT, "%s.rope.dimension_count" },
165166
{ LLM_KV_ROPE_DIMENSION_SECTIONS, "%s.rope.dimension_sections" },

src/llama-arch.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -164,6 +164,7 @@ enum llm_kv {
164164
LLM_KV_ATTENTION_SCALE,
165165
LLM_KV_ATTENTION_KEY_LENGTH_MLA,
166166
LLM_KV_ATTENTION_VALUE_LENGTH_MLA,
167+
LLM_KV_ATTENTION_LAYER_INDICES,
167168

168169
LLM_KV_ROPE_DIMENSION_COUNT,
169170
LLM_KV_ROPE_DIMENSION_SECTIONS,

src/llama-model.cpp

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -646,6 +646,9 @@ void llama_model::load_hparams(llama_model_loader & ml) {
646646
ml.get_key(LLM_KV_RESIDUAL_SCALE, hparams.f_residual_scale);
647647
ml.get_key(LLM_KV_LOGIT_SCALE, hparams.f_logit_scale);
648648

649+
// MiniCPM uses rope by default, unlike Granite which uses it as a switch
650+
hparams.rope_finetuned = true;
651+
649652
switch (hparams.n_layer) {
650653
case 52: type = LLM_TYPE_1B; break;
651654
case 40: type = LLM_TYPE_2B; break;

0 commit comments

Comments
 (0)