Skip to content

Releases: AmpereComputingAI/llama.cpp

v3.3.1

15 Oct 16:32
6219c16

Choose a tag to compare

Also available at: DockerHub

v3.3.0

09 Oct 12:54
6219c16

Choose a tag to compare

Also available at: DockerHub

v3.2.1

03 Sep 10:24
ecbcf6e

Choose a tag to compare

Also available at: DockerHub

v3.2.0

06 Aug 21:39
ecbcf6e

Choose a tag to compare

Also available at: DockerHub

v3.1.2

07 Jul 12:40
aa0a5d7

Choose a tag to compare

Also available at: DockerHub

v3.1.0

11 Jun 21:21
aa0a5d7

Choose a tag to compare

Also available at: DockerHub

v2.2.1

03 Jun 15:44
aa0a5d7

Choose a tag to compare

Update benchmark.py

v2.0.0

23 Sep 20:15
4f32b2c

Choose a tag to compare

  • Upgraded upstream tag enables Llama 3.1 in ollama
  • Support for AmpereOne platform
  • Breaking change: due to changed weight type IDs it is now required to re-quantize models to Q8R16 and Q4_K_4 formats with current llama-quantize tool.

v1.2.6

16 Jul 23:03
06e1efb

Choose a tag to compare

Create README.md

v1.2.3

02 Jul 22:47
855aa8d

Choose a tag to compare

  • The rebase is to allow llama-cpp-python to pick up upstream CVE fix (GHSA-56xg-wfcc-g829)
  • Experimental support for Q8R16 quantized format with optimized matrix multiplication kernels
  • CMake files updated to build llama.aio on AmpereOne