Skip to content

Releases: ggml-org/llama.cpp

b5380

14 May 14:06
0531744
Compare
Choose a tag to compare
server : passthrough the /models endpoint during loading (#13535)

* server : passthrough the /models endpoint during loading

* server : update readme + return json for "meta" field

b5379

14 May 12:10
360a9c9
Compare
Choose a tag to compare
server : fix cache_tokens bug with no cache_prompt (#13533)

b5378

14 May 11:13
09d13d9
Compare
Choose a tag to compare
cmake: simplify vulkan shader test logic (#13263)

b5377

14 May 10:27
24e86ca
Compare
Choose a tag to compare
vulkan: KHR_coopmat flash attention (#13506)

This shader uses coopmat1 to do the Q*K^T multiply. The P*V multiply is more
difficult for various reasons so I haven't done it. Performance for this
shader is around 2.5x better than for the scalar shader when doing prompt
processing. Some of the benefit may be from other optimizations like staging
through shared memory, or splitting by rows.

b5372

14 May 04:32
ab3971f
Compare
Choose a tag to compare
vulkan: workaround FA compile failures on macos (#13517)

b5371

13 May 18:00
e5c834f
Compare
Choose a tag to compare
quantize : improve tensor-type pattern matching (#13033)

b5370

13 May 16:42
71bdbdb
Compare
Choose a tag to compare
clip : clip.h become private API (⚠️ breaking change) (#13510)

b5369

13 May 16:39
f0995d2
Compare
Choose a tag to compare
metal : use FA-vec kernel up to batch size 20 (#13496)

* batched-bench : fix pp batch contents

* metal : optimize multi-sequence FA vec kernel

ggml-ci

* metal : use FA-vec kernel up to batch size 20

ggml-ci

b5368

13 May 16:21
c252e0c
Compare
Choose a tag to compare
metal : optimize multi-sequence FA vec kernel (#13493)

* batched-bench : fix pp batch contents

* metal : optimize multi-sequence FA vec kernel

ggml-ci

b5367

13 May 16:21
4f711af
Compare
Choose a tag to compare
ggml-cpu: Update KleidiAI to v1.6 and fix include directives (#13509)

Signed-off-by: Dan Johansson <[email protected]>