Fix kq_scale for the attention layers of PLaMo2 #14892

mitmul · 2025-07-26T17:45:33Z

This PR fixes the scale given to build_attn() in the PLaMo2 model part.
It should be 1.0f/sqrtf(float(n_embd_head_v)) but it was just 1.0 and significantly reduces the output quality of the model.
This PR also fixes some default values used in convert_hf_to_gguf.py for PLaMo2 to align them with the values used as the default values in the modeling_plamo.py of PLaMo2.

CISC

Thanks for following up! :)

src/llama-model.cpp

CISC · 2025-07-26T21:12:54Z

src/llama-model.cpp

-                    ggml_view_1d(ctx0, ssm_states_all, d_state*d_inner*n_seqs,
-                            kv_head*d_state*d_inner*ggml_element_size(ssm_states_all))));
+                    ggml_view_1d(ctx0, y_ssm, n_heads*head_dim*d_state*n_seqs, n_heads*head_dim*n_seq_tokens*n_seqs*ggml_element_size(y_ssm)),
+                    ggml_view_1d(ctx0, ssm_states_all, n_heads*head_dim*d_state*n_seqs, kv_head*n_seqs*n_heads*head_dim*d_state*ggml_element_size(ssm_states_all))));


Mind commenting on these changes?

These changes actually replace d_inner with n_heads*head_dim for both of y_ssm and ssm_states_all. This is because I thought it's more natural because the ssm state created in get_ssm_rows() has the shape of (d_state , head_dim, n_heads, n_seqs) here:

llama.cpp/src/llama-model.cpp

Line 16359 in 446595b

ggml_tensor * ssm = ggml_reshape_4d(ctx, states, d_state, head_dim, n_heads, mctx_cur->get_size());

But, if d_inner is always the same as n_heads*head_dim, I'm happy to revert this change cause it's unnecessary.

Co-authored-by: Sigbjørn Skjæret <[email protected]>

mitmul added 4 commits July 18, 2025 17:05

Fix dimensions for expand

7baf4fd

Change dimensions to copy states to cache

e39bc09

Fix the default value for plamo2 conversion

bd4d2e1

Fix scale given to build_attn

c475203

github-actions bot added the python python script changes label Jul 26, 2025

CISC approved these changes Jul 26, 2025

View reviewed changes

src/llama-model.cpp Outdated Show resolved Hide resolved

src/llama-model.cpp Outdated Show resolved Hide resolved

src/llama-model.cpp Outdated Show resolved Hide resolved

CISC reviewed Jul 26, 2025

View reviewed changes

mitmul and others added 3 commits July 27, 2025 10:26

Update src/llama-model.cpp

75f0a0d

Co-authored-by: Sigbjørn Skjæret <[email protected]>

Update src/llama-model.cpp

429639d

Co-authored-by: Sigbjørn Skjæret <[email protected]>

Update src/llama-model.cpp

60a705d

Co-authored-by: Sigbjørn Skjæret <[email protected]>

CISC merged commit 1dc9614 into ggml-org:master Jul 27, 2025
50 checks passed

mitmul deleted the mitmul/fix-build-attn-scale-plamo2 branch July 27, 2025 07:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix kq_scale for the attention layers of PLaMo2 #14892

Fix kq_scale for the attention layers of PLaMo2 #14892

mitmul commented Jul 26, 2025 •

edited

Loading

Uh oh!

CISC left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

CISC Jul 26, 2025

Uh oh!

mitmul Jul 27, 2025

Uh oh!

Uh oh!

Uh oh!

Fix kq_scale for the attention layers of PLaMo2 #14892

Fix kq_scale for the attention layers of PLaMo2 #14892

Conversation

mitmul commented Jul 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CISC left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

CISC Jul 26, 2025

Choose a reason for hiding this comment

Uh oh!

mitmul Jul 27, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mitmul commented Jul 26, 2025 •

edited

Loading