imatrix : fix 3d activation handling for hybrid and recurrent models #14994

compilade · 2025-07-31T16:41:08Z

Fixes #14979, and also implements follow-up simplification from #9400 (comment).

The problem affected only recurrent and hybrid models when multiple sequences are processed at once.

Changes:

Use a single count for tensor used with MUL_MAT, even 3d tensors
- This is a follow-up from imatrix : use GGUF to store importance matrices #9400 (comment)
Correctly handle 3D shapes for recurrent activations (e.g. {n_embd, n_seq_tokens, n_seqs})
- See also Eval bug: imatrix generation for LFM2 fails; collect_imatrix: inconsistent size for blk.0.shortconv.in_proj.weight #14979 (comment)

I've tested imatrix generation and quantization for

https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite-Chat
- to check if 3d MLA tensors are still handled correctly
- perplexity is the same as in imatrix : use GGUF to store importance matrices #9400
https://huggingface.co/LiquidAI/LFM2-350M
- to check if the problem from Eval bug: imatrix generation for LFM2 fails; collect_imatrix: inconsistent size for blk.0.shortconv.in_proj.weight #14979 is fixed
- it doesn't fail and perplexity of quantized model is reasonable (for 10 chunks of wiki.train.raw, tested on 10 chunks of wiki.test.raw, 27.8220 +/- 1.65489 at Q4_K vs 27.0357 +/- 1.60664 at BF16 (Q4_K without imatrix results in a PPL of 28.1603 +/- 1.67166 in the same conditions))

Make sure to read the contributing guidelines before submitting a PR

CISC

I feel we need some good tests for imatrix, maybe add that to #14139?

Edit: Though not practical on random models, just thinking layout-wise maybe it's possible to do something...

compilade · 2025-08-03T05:40:09Z

I feel we need some good tests for imatrix, maybe add that to #14139?

@CISC
I agree more comprehensive imatrix tests would be useful. test-model-random might not be the right place for this, unless it either could call other binaries or it could be used to generate a specific random model to run other tests from an external script. The only problem with that is what should happen when an architecture has multiple variants (e.g. llama which can sometimes be MoE).

Edit: Though not practical on random models, just thinking layout-wise maybe it's possible to do something...

It would be nice to be able to statically check correctness of the shapes, but it doesn't seem simple. It would almost require a DSL and/or some way to track relationships and constraints between shapes. Run-time tests until then.

I've tested that the activation counts make sense for both MLA 3d tensors, and 3d recurrent activations. So I consider this ready to merge.

Note that without #15050, hybrid models crash llama-imatrix by default (except when using -kvu). That's not yet a problem on this branch because it doesn't yet include the changes from #14959, but it could be a problem on master (temporarily), depending on the merge order.

compilade added 4 commits July 19, 2025 12:57

imatrix : use a single count for dense 3d tensors

73439be

imatrix : fix 3d activations when model tensor is 2d

d4f36e5

Merge branch 'master' into compilade/imatrix-saner-3d

05beb07

imatrix : fix 3d tensor counts

91e67b8

compilade mentioned this pull request Jul 31, 2025

Eval bug: imatrix generation for LFM2 fails; collect_imatrix: inconsistent size for blk.0.shortconv.in_proj.weight #14979

Closed

compilade added the bugfix fixes an issue or bug label Jul 31, 2025

github-actions bot added the examples label Jul 31, 2025

CISC approved these changes Jul 31, 2025

View reviewed changes

CISC merged commit 0a2f549 into master Aug 3, 2025
47 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

imatrix : fix 3d activation handling for hybrid and recurrent models #14994

imatrix : fix 3d activation handling for hybrid and recurrent models #14994

Uh oh!

compilade commented Jul 31, 2025

Uh oh!

CISC left a comment •

edited

Loading

Uh oh!

compilade commented Aug 3, 2025

Uh oh!

Uh oh!

Uh oh!

imatrix : fix 3d activation handling for hybrid and recurrent models #14994

imatrix : fix 3d activation handling for hybrid and recurrent models #14994

Uh oh!

Conversation

compilade commented Jul 31, 2025

Uh oh!

CISC left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

compilade commented Aug 3, 2025

Uh oh!

Uh oh!

Uh oh!

CISC left a comment •

edited

Loading