imatrix : fix 3d activation handling for hybrid and recurrent models #14994
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #14979, and also implements follow-up simplification from #9400 (comment).
The problem affected only recurrent and hybrid models when multiple sequences are processed at once.
Changes:
MUL_MAT
, even 3d tensors{n_embd, n_seq_tokens, n_seqs}
)I've tested imatrix generation and quantization for
wiki.train.raw
, tested on 10 chunks ofwiki.test.raw
,27.8220 +/- 1.65489
atQ4_K
vs27.0357 +/- 1.60664
atBF16
(Q4_K
without imatrix results in a PPL of28.1603 +/- 1.67166
in the same conditions))Make sure to read the contributing guidelines before submitting a PR