Releases · ggml-org/llama.cpp

20 May 05:39

e298d2f

b5429

kv-cache : add SWA support (#13194)

* kv-cache : prepare for SWA

ggml-ci

* kv-cache : initial iSWA implementation

ggml-ci

* kv-cache : rework error recovery logic

ggml-ci

* models : fix Phi-3 SWA parameters

ggml-ci

* model : adjust Granite to rope factor changes

ggml-ci

* server : check if context can do shifts

ggml-ci

* iswa : for now, always enable shifts (experiment)

ggml-ci

* kv-cache : simplify SWA logic

ggml-ci

* kv-cache : apply defrag when we fail to find slots for the batch

ggml-ci

* llama : update docs about llama_decode

ggml-ci

* kv-cache : update warning logs when no space for the batch is available

ggml-ci

* llama : add llama_kv_self_seq_pos_min()

* kv-cache : keep track of partial SWA computes and print warnings

* server : disallow use cases involving partial SWA context

ggml-ci

* llama : add param to control SWA cache size

ggml-ci

* minor : clean-up

ggml-ci

Assets 20

20 May 01:15

github-actions

b5427

f7c9429

b5427

sycl : Overcoming workaround for mmap() allocation on Windows (#13482)

* Remove mmap workaround on windows

After some testing I found that mmap is supported on windows and for
many GPUs on Linux. Therefore I remove the workaround for windows since
it is not necessary.

* Update llama-bench README

SYCL backend introduced a workaround that allows execution of
llama-bench also without specifying `--mmp 0` flag

Assets 20

19 May 19:36

github-actions

b5426

1dfbf2c

b5426

common : add load_progress_callback (#13617)

Assets 20

19 May 17:11

github-actions

b5425

8960efd

b5425

Vulkan: Add f32 accumulator support to quantized mul mat to fix GLM4 …

Assets 20

19 May 12:04

github-actions

b5423

92ecdcc

b5423

mtmd : add vision support for llama 4 (#13282)

* wip llama 4 conversion

* rm redundant __init__

* fix conversion

* fix conversion

* test impl

* try this

* reshape patch_embeddings_0

* fix view

* rm ffn_post_norm

* cgraph ok

* f32 for pos embd

* add image marker tokens

* Llama4UnfoldConvolution

* correct pixel shuffle

* fix merge conflicts

* correct

* add debug_graph

* logits matched, but it still preceives the image incorrectly

* fix style

* add image_grid_pinpoints

* handle llama 4 preprocessing

* rm load_image_size

* rm unused line

* fix

* small fix 2

* add test & docs

* fix llava-1.6 test

* test: add notion of huge models

* add comment

* add warn about degraded quality

Assets 20

19 May 11:53

github-actions

b5422

f71f40a

b5422

ci : upgraded oneAPI version in SYCL workflows and dockerfile (#13532)

Assets 20

19 May 11:43

github-actions

b5421

d30cb5a

b5421

sync : ggml

ggml-ci

Assets 20

19 May 11:26

github-actions

b5417

9c55e5c

b5417

fix: check model pointer validity before use (#13631)

Assets 20

19 May 07:28

github-actions

b5416

33d7aed

b5416

CANN: Support MOE Model MUL_MAT_ID (#13042)

Signed-off-by: noemotiovon <[email protected]>

Assets 20

17 May 22:23

github-actions

b5415

6a2bc8b

b5415

server : added --no-prefill-assistant flag (#13608)

* added no-prefill-assistant flag

* reworded documentation comment

* updated server README.md

Assets 20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: ggml-org/llama.cpp

b5429

Uh oh!

b5427

Uh oh!

b5426

Uh oh!

b5425

Uh oh!

b5423

Uh oh!

b5422

Uh oh!

b5421

Uh oh!

b5417

Uh oh!

b5416

Uh oh!

b5415

Uh oh!