fix: restore MiniCPM inference after Granite Four changes

jk3456a · jk3456a · commit 19594e5859cb · 2025-07-24T15:03:21.000+08:00
This commit fixes MiniCPM model inference that was broken by the Granite Four PR (#13550). The issue had two parts: 1. Missing LLM_KV_ATTENTION_LAYER_INDICES enum value that was removed, causing enum ordering to shift and breaking model metadata parsing 2. MiniCPM architecture uses llm_build_granite which was changed to use hparams.rope_finetuned instead of use_rope parameter, but MiniCPM models were not setting this flag correctly Changes: - Restore LLM_KV_ATTENTION_LAYER_INDICES enum and string mapping - Set hparams.rope_finetuned = true for MiniCPM architecture Fixes inference output from gibberish to correct model responses. Tested with MiniCPM 0.5B model showing proper inference: Input: "你好" Output: "你好，我是MiniCPM系列模型，由面壁智能和OpenBMB开源社区开发。详细信息请访问 https://github.com/OpenBMB/ [end of text]"
diff --git a/src/llama-arch.cpp b/src/llama-arch.cpp
@@ -160,6 +160,7 @@ static const std::map<llm_kv, const char *> LLM_KV_NAMES = {
     { LLM_KV_ATTENTION_SCALE,                        "%s.attention.scale"                        },
     { LLM_KV_ATTENTION_KEY_LENGTH_MLA,               "%s.attention.key_length_mla"               },
     { LLM_KV_ATTENTION_VALUE_LENGTH_MLA,             "%s.attention.value_length_mla"             },
+    { LLM_KV_ATTENTION_LAYER_INDICES,                "%s.attention.layer_indices"                },
 
     { LLM_KV_ROPE_DIMENSION_COUNT,      "%s.rope.dimension_count"                 },
     { LLM_KV_ROPE_DIMENSION_SECTIONS,   "%s.rope.dimension_sections"              },
diff --git a/src/llama-arch.h b/src/llama-arch.h
@@ -164,6 +164,7 @@ enum llm_kv {
     LLM_KV_ATTENTION_SCALE,
     LLM_KV_ATTENTION_KEY_LENGTH_MLA,
     LLM_KV_ATTENTION_VALUE_LENGTH_MLA,
+    LLM_KV_ATTENTION_LAYER_INDICES,
 
     LLM_KV_ROPE_DIMENSION_COUNT,
     LLM_KV_ROPE_DIMENSION_SECTIONS,
diff --git a/src/llama-model.cpp b/src/llama-model.cpp
@@ -646,6 +646,9 @@ void llama_model::load_hparams(llama_model_loader & ml) {
                 ml.get_key(LLM_KV_RESIDUAL_SCALE,              hparams.f_residual_scale);
                 ml.get_key(LLM_KV_LOGIT_SCALE,                 hparams.f_logit_scale);
 
+                // MiniCPM uses rope by default, unlike Granite which uses it as a switch
+                hparams.rope_finetuned = true;
+
                 switch (hparams.n_layer) {
                     case 52: type = LLM_TYPE_1B; break;
                     case 40: type = LLM_TYPE_2B; break;