Releases · ggml-org/llama.cpp

14 May 14:06

0531744

b5380

server : passthrough the /models endpoint during loading (#13535)

* server : passthrough the /models endpoint during loading

* server : update readme + return json for "meta" field

Assets 20

14 May 12:10

github-actions

b5379

360a9c9

b5379

server : fix cache_tokens bug with no cache_prompt (#13533)

Assets 20

14 May 11:13

github-actions

b5378

09d13d9

b5378

cmake: simplify vulkan shader test logic (#13263)

Assets 20

14 May 10:27

github-actions

b5377

24e86ca

b5377

vulkan: KHR_coopmat flash attention (#13506)

This shader uses coopmat1 to do the Q*K^T multiply. The P*V multiply is more
difficult for various reasons so I haven't done it. Performance for this
shader is around 2.5x better than for the scalar shader when doing prompt
processing. Some of the benefit may be from other optimizations like staging
through shared memory, or splitting by rows.

Assets 20

14 May 04:32

github-actions

b5372

ab3971f

b5372

vulkan: workaround FA compile failures on macos (#13517)

Assets 20

13 May 18:00

github-actions

b5371

e5c834f

b5371

quantize : improve tensor-type pattern matching (#13033)

Assets 20

13 May 16:42

github-actions

b5370

71bdbdb

b5370

clip : clip.h become private API (⚠️ breaking change) (#13510)

Assets 20

13 May 16:39

github-actions

b5369

f0995d2

b5369

metal : use FA-vec kernel up to batch size 20 (#13496)

* batched-bench : fix pp batch contents

* metal : optimize multi-sequence FA vec kernel

ggml-ci

* metal : use FA-vec kernel up to batch size 20

ggml-ci

Assets 20

13 May 16:21

github-actions

b5368

c252e0c

b5368

metal : optimize multi-sequence FA vec kernel (#13493)

* batched-bench : fix pp batch contents

* metal : optimize multi-sequence FA vec kernel

ggml-ci

Assets 20

13 May 16:21

github-actions

b5367

4f711af

b5367

ggml-cpu: Update KleidiAI to v1.6 and fix include directives (#13509)

Signed-off-by: Dan Johansson <[email protected]>

Assets 20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: ggml-org/llama.cpp

b5380

Uh oh!

b5379

Uh oh!

b5378

Uh oh!

b5377

Uh oh!

b5372

Uh oh!

b5371

Uh oh!

b5370

Uh oh!

b5369

Uh oh!

b5368

Uh oh!

b5367

Uh oh!