Skip to content

Releases: ggml-org/llama.cpp

b5429

20 May 05:39
e298d2f
Compare
Choose a tag to compare
kv-cache : add SWA support (#13194)

* kv-cache : prepare for SWA

ggml-ci

* kv-cache : initial iSWA implementation

ggml-ci

* kv-cache : rework error recovery logic

ggml-ci

* models : fix Phi-3 SWA parameters

ggml-ci

* model : adjust Granite to rope factor changes

ggml-ci

* server : check if context can do shifts

ggml-ci

* iswa : for now, always enable shifts (experiment)

ggml-ci

* kv-cache : simplify SWA logic

ggml-ci

* kv-cache : apply defrag when we fail to find slots for the batch

ggml-ci

* llama : update docs about llama_decode

ggml-ci

* kv-cache : update warning logs when no space for the batch is available

ggml-ci

* llama : add llama_kv_self_seq_pos_min()

* kv-cache : keep track of partial SWA computes and print warnings

* server : disallow use cases involving partial SWA context

ggml-ci

* llama : add param to control SWA cache size

ggml-ci

* minor : clean-up

ggml-ci

b5427

20 May 01:15
f7c9429
Compare
Choose a tag to compare
sycl : Overcoming workaround for mmap() allocation on Windows (#13482)

* Remove mmap workaround on windows

After some testing I found that mmap is supported on windows and for
many GPUs on Linux. Therefore I remove the workaround for windows since
it is not necessary.

* Update llama-bench README

SYCL backend introduced a workaround that allows execution of
llama-bench also without specifying `--mmp 0` flag

b5426

19 May 19:36
1dfbf2c
Compare
Choose a tag to compare
common : add load_progress_callback (#13617)

b5425

19 May 17:11
8960efd
Compare
Choose a tag to compare
Vulkan: Add f32 accumulator support to quantized mul mat to fix GLM4 …

b5423

19 May 12:04
92ecdcc
Compare
Choose a tag to compare
mtmd : add vision support for llama 4 (#13282)

* wip llama 4 conversion

* rm redundant __init__

* fix conversion

* fix conversion

* test impl

* try this

* reshape patch_embeddings_0

* fix view

* rm ffn_post_norm

* cgraph ok

* f32 for pos embd

* add image marker tokens

* Llama4UnfoldConvolution

* correct pixel shuffle

* fix merge conflicts

* correct

* add debug_graph

* logits matched, but it still preceives the image incorrectly

* fix style

* add image_grid_pinpoints

* handle llama 4 preprocessing

* rm load_image_size

* rm unused line

* fix

* small fix 2

* add test & docs

* fix llava-1.6 test

* test: add notion of huge models

* add comment

* add warn about degraded quality

b5422

19 May 11:53
f71f40a
Compare
Choose a tag to compare
ci : upgraded oneAPI version in SYCL workflows and dockerfile (#13532)

b5421

19 May 11:43
Compare
Choose a tag to compare
sync : ggml

ggml-ci

b5417

19 May 11:26
9c55e5c
Compare
Choose a tag to compare
fix: check model pointer validity before use (#13631)

b5416

19 May 07:28
33d7aed
Compare
Choose a tag to compare
CANN: Support MOE Model MUL_MAT_ID (#13042)

Signed-off-by: noemotiovon <[email protected]>

b5415

17 May 22:23
6a2bc8b
Compare
Choose a tag to compare
server : added --no-prefill-assistant flag (#13608)

* added no-prefill-assistant flag

* reworded documentation comment

* updated server README.md