Skip to content

imatrix : fix 3d activation handling for hybrid and recurrent models #14994

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Aug 3, 2025

Conversation

compilade
Copy link
Collaborator

Fixes #14979, and also implements follow-up simplification from #9400 (comment).

The problem affected only recurrent and hybrid models when multiple sequences are processed at once.

Changes:

I've tested imatrix generation and quantization for


Make sure to read the contributing guidelines before submitting a PR

Copy link
Collaborator

@CISC CISC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel we need some good tests for imatrix, maybe add that to #14139?

Edit: Though not practical on random models, just thinking layout-wise maybe it's possible to do something...

@compilade
Copy link
Collaborator Author

I feel we need some good tests for imatrix, maybe add that to #14139?

@CISC
I agree more comprehensive imatrix tests would be useful. test-model-random might not be the right place for this, unless it either could call other binaries or it could be used to generate a specific random model to run other tests from an external script. The only problem with that is what should happen when an architecture has multiple variants (e.g. llama which can sometimes be MoE).

Edit: Though not practical on random models, just thinking layout-wise maybe it's possible to do something...

It would be nice to be able to statically check correctness of the shapes, but it doesn't seem simple. It would almost require a DSL and/or some way to track relationships and constraints between shapes. Run-time tests until then.


I've tested that the activation counts make sense for both MLA 3d tensors, and 3d recurrent activations. So I consider this ready to merge.

Note that without #15050, hybrid models crash llama-imatrix by default (except when using -kvu). That's not yet a problem on this branch because it doesn't yet include the changes from #14959, but it could be a problem on master (temporarily), depending on the merge order.

@CISC CISC merged commit 0a2f549 into master Aug 3, 2025
47 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix fixes an issue or bug examples
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Eval bug: imatrix generation for LFM2 fails; collect_imatrix: inconsistent size for blk.0.shortconv.in_proj.weight
2 participants