-
Notifications
You must be signed in to change notification settings - Fork 12.5k
Fix kq_scale for the attention layers of PLaMo2 #14892
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix kq_scale for the attention layers of PLaMo2 #14892
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for following up! :)
ggml_view_1d(ctx0, ssm_states_all, d_state*d_inner*n_seqs, | ||
kv_head*d_state*d_inner*ggml_element_size(ssm_states_all)))); | ||
ggml_view_1d(ctx0, y_ssm, n_heads*head_dim*d_state*n_seqs, n_heads*head_dim*n_seq_tokens*n_seqs*ggml_element_size(y_ssm)), | ||
ggml_view_1d(ctx0, ssm_states_all, n_heads*head_dim*d_state*n_seqs, kv_head*n_seqs*n_heads*head_dim*d_state*ggml_element_size(ssm_states_all)))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mind commenting on these changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes actually replace d_inner
with n_heads*head_dim
for both of y_ssm
and ssm_states_all
. This is because I thought it's more natural because the ssm
state created in get_ssm_rows()
has the shape of (d_state , head_dim, n_heads, n_seqs)
here:
Line 16359 in 446595b
ggml_tensor * ssm = ggml_reshape_4d(ctx, states, d_state, head_dim, n_heads, mctx_cur->get_size()); |
But, if d_inner
is always the same as n_heads*head_dim
, I'm happy to revert this change cause it's unnecessary.
Co-authored-by: Sigbjørn Skjæret <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>
This PR fixes the scale given to
build_attn()
in the PLaMo2 model part.It should be
1.0f/sqrtf(float(n_embd_head_v))
but it was just1.0
and significantly reduces the output quality of the model.This PR also fixes some default values used in
convert_hf_to_gguf.py
for PLaMo2 to align them with the values used as the default values in themodeling_plamo.py
of PLaMo2.