Skip to content

Fix kq_scale for the attention layers of PLaMo2 #14892

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jul 27, 2025

Conversation

mitmul
Copy link
Contributor

@mitmul mitmul commented Jul 26, 2025

This PR fixes the scale given to build_attn() in the PLaMo2 model part.
It should be 1.0f/sqrtf(float(n_embd_head_v)) but it was just 1.0 and significantly reduces the output quality of the model.
This PR also fixes some default values used in convert_hf_to_gguf.py for PLaMo2 to align them with the values used as the default values in the modeling_plamo.py of PLaMo2.

@github-actions github-actions bot added the python python script changes label Jul 26, 2025
Copy link
Collaborator

@CISC CISC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for following up! :)

ggml_view_1d(ctx0, ssm_states_all, d_state*d_inner*n_seqs,
kv_head*d_state*d_inner*ggml_element_size(ssm_states_all))));
ggml_view_1d(ctx0, y_ssm, n_heads*head_dim*d_state*n_seqs, n_heads*head_dim*n_seq_tokens*n_seqs*ggml_element_size(y_ssm)),
ggml_view_1d(ctx0, ssm_states_all, n_heads*head_dim*d_state*n_seqs, kv_head*n_seqs*n_heads*head_dim*d_state*ggml_element_size(ssm_states_all))));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mind commenting on these changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes actually replace d_inner with n_heads*head_dim for both of y_ssm and ssm_states_all. This is because I thought it's more natural because the ssm state created in get_ssm_rows() has the shape of (d_state , head_dim, n_heads, n_seqs) here:

ggml_tensor * ssm = ggml_reshape_4d(ctx, states, d_state, head_dim, n_heads, mctx_cur->get_size());

But, if d_inner is always the same as n_heads*head_dim, I'm happy to revert this change cause it's unnecessary.

mitmul and others added 3 commits July 27, 2025 10:26
Co-authored-by: Sigbjørn Skjæret <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>
@CISC CISC merged commit 1dc9614 into ggml-org:master Jul 27, 2025
50 checks passed
@mitmul mitmul deleted the mitmul/fix-build-attn-scale-plamo2 branch July 27, 2025 07:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python python script changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants