get_rows & dequantize function implementation for repacked weights of type q6_K (q6_Kx8) #16743

swetha097 · 2025-10-23T14:40:14Z

NOTE: Creating the PR changes required for whisper.cpp here as llama.cpp already includes test-backend-op

This implements the GGML_OP_GET_ROWS operation specifically for repacked (block interleaved) 6-bit quantized format (q6_Kx8).
The following gains were observed by the changes made in the PR - The changes allow for increased usage of the GEMM function (ggml_gemm_q6_K_8x8_q8_0) for q6_K type.
The PR was tested in AMD Raphael 7600X for whisper - which supports the following flags :
system_info: n_threads = 4 / 12 | WHISPER : COREML = 0 | OPENVINO = 0 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 1 | LLAMAFILE = 1 | OPENMP = 1 | AARCH64_REPACK = 1 |

master branch commit - swetha097/whisper.cpp@fc45bb8

q6K repacking commit ( block interleaving approach for Q6_K quantization for x64/x86 SIMD Architecture )- swetha097/whisper.cpp@d89aaf2

development (get_rows) branch commit - swetha097/whisper.cpp@de9839e

Model for performance tests Downloaded from : https://huggingface.co/ggerganov/whisper.cpp/blob/main/ggml-base.en.bin and quantized to q6_K

This patch of the code was also tested with llama.cpp repository & the perplexity of Q6_K models were ensured to be the same before and after the changes made :

Final estimate: PPL = 5.3669 +/- 0.13305

Model used for perplexity test quantized from - https://huggingface.co/meta-llama/Llama-2-7b

This PR is to merged after - Q6_K - Block Interleaving Implementation for x86 SIMD (AVX512/AVX2) #15275

swetha097 added 2 commits October 23, 2025 02:17

q6K get_rows & dequantize function

8ffdaea

Resolve PR comments

d611fb4

swetha097 requested review from ggerganov and slaren as code owners October 23, 2025 14:40

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Oct 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

get_rows & dequantize function implementation for repacked weights of type q6_K (q6_Kx8) #16743

get_rows & dequantize function implementation for repacked weights of type q6_K (q6_Kx8) #16743

swetha097 commented Oct 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

get_rows & dequantize function implementation for repacked weights of type q6_K (q6_Kx8) #16743

Are you sure you want to change the base?

get_rows & dequantize function implementation for repacked weights of type q6_K (q6_Kx8) #16743

Conversation

swetha097 commented Oct 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant