get_rows & dequantize function implementation for repacked weights of type q6_K (q6_Kx8) #16743
+169
−1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
NOTE: Creating the PR changes required for whisper.cpp here as llama.cpp already includes test-backend-op
system_info: n_threads = 4 / 12 | WHISPER : COREML = 0 | OPENVINO = 0 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 1 | LLAMAFILE = 1 | OPENMP = 1 | AARCH64_REPACK = 1 |
master branch commit - swetha097/whisper.cpp@fc45bb8
q6K repacking commit ( block interleaving approach for Q6_K quantization for x64/x86 SIMD Architecture )- swetha097/whisper.cpp@d89aaf2
development (get_rows) branch commit - swetha097/whisper.cpp@de9839e
Model for performance tests Downloaded from : https://huggingface.co/ggerganov/whisper.cpp/blob/main/ggml-base.en.bin and quantized to q6_K
This patch of the code was also tested with llama.cpp repository & the perplexity of Q6_K models were ensured to be the same before and after the changes made :
Final estimate: PPL = 5.3669 +/- 0.13305
Model used for perplexity test quantized from - https://huggingface.co/meta-llama/Llama-2-7b
This PR is to merged after - Q6_K - Block Interleaving Implementation for x86 SIMD (AVX512/AVX2) #15275