Skip to content

Conversation

@swetha097
Copy link

NOTE: Creating the PR changes required for whisper.cpp here as llama.cpp already includes test-backend-op

  • This implements the GGML_OP_GET_ROWS operation specifically for repacked (block interleaved) 6-bit quantized format (q6_Kx8).
  • The following gains were observed by the changes made in the PR - The changes allow for increased usage of the GEMM function (ggml_gemm_q6_K_8x8_q8_0) for q6_K type.
  • The PR was tested in AMD Raphael 7600X for whisper - which supports the following flags :
    system_info: n_threads = 4 / 12 | WHISPER : COREML = 0 | OPENVINO = 0 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 1 | LLAMAFILE = 1 | OPENMP = 1 | AARCH64_REPACK = 1 |
image

master branch commit - swetha097/whisper.cpp@fc45bb8

q6K repacking commit ( block interleaving approach for Q6_K quantization for x64/x86 SIMD Architecture )- swetha097/whisper.cpp@d89aaf2

development (get_rows) branch commit - swetha097/whisper.cpp@de9839e

Model for performance tests Downloaded from : https://huggingface.co/ggerganov/whisper.cpp/blob/main/ggml-base.en.bin and quantized to q6_K

This patch of the code was also tested with llama.cpp repository & the perplexity of Q6_K models were ensured to be the same before and after the changes made :

Final estimate: PPL = 5.3669 +/- 0.13305

Model used for perplexity test quantized from - https://huggingface.co/meta-llama/Llama-2-7b

This PR is to merged after - Q6_K - Block Interleaving Implementation for x86 SIMD (AVX512/AVX2) #15275

@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Oct 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant