Releases · AmpereComputingAI/llama.cpp

Upgraded upstream tag enables Llama 3.1 in ollama
Support for AmpereOne platform
Breaking change: due to changed weight type IDs it is now required to re-quantize models to Q8R16 and Q4_K_4 formats with current llama-quantize tool.

Assets 4

16 Jul 23:03

v1.2.6

Create README.md

Assets 4

02 Jul 22:47

v1.2.3

The rebase is to allow llama-cpp-python to pick up upstream CVE fix (GHSA-56xg-wfcc-g829)
Experimental support for Q8R16 quantized format with optimized matrix multiplication kernels
CMake files updated to build llama.aio on AmpereOne

Assets 7

Releases: AmpereComputingAI/llama.cpp