Skip to content

Releases: ggml-org/llama.cpp

b6686

03 Oct 20:02
128d522
Compare
Choose a tag to compare
chat : support Magistral thinking (#16413)

* feat: added a dedicated Magistral chat format that preserves [THINK] spans, parses reasoning before tool calls

* feat: new flow in the chat template test suite for Magistral

b6685

03 Oct 19:55
f6dcda3
Compare
Choose a tag to compare
server : context checkpointing for hybrid and recurrent models (#16382)

* initial commit for branch 3

* generalize `swa_checkpoint` to `ctx_checkpoint`

this extends `llama-server`'s SWA checkpointing logic to include
hybrid/recurrent models such as Jamba, Granite

* oops

* disable debug prints

* keep backwards compat with `--swa-checkpoints`

Co-authored-by: Georgi Gerganov <[email protected]>

* update prompt re-processing message

* fix off-by-one error per GG

* keep `seq_rm` log per GG

Co-authored-by: Georgi Gerganov <[email protected]>

* server : fix checkpoint logic to support recurrent caches

* server : cleanup and fixes

---------

Co-authored-by: Georgi Gerganov <[email protected]>

b6684

03 Oct 16:55
606a73f
Compare
Choose a tag to compare
metal : fix loop bound in ggml_mem_ranges (#16412)

b6683

03 Oct 12:59
946f71e
Compare
Choose a tag to compare
llama : fix shapes for bert/mpt q/k norm (#16409)

b6682

03 Oct 12:13
638d330
Compare
Choose a tag to compare
ggml : fix graph reallocation with multiple chunks (#16396)

reallocation is needed if a single chunk grows in size,
even if total allocation size stays the same or is lower

b6680

03 Oct 11:10
2aaf0a2
Compare
Choose a tag to compare
vulkan: Replace uses of maxMemoryAllocationSize and VK_WHOLE_SIZE (#1…

b6679

03 Oct 10:27
0e1f838
Compare
Choose a tag to compare
vulkan: Fix FA coopmat1 invalid array indexing (#16365)

When computing sinks, the cm1 shader was looping r from 0 to Br rather than
to rows_per_thread. I must have copied this from the scalar path (where it is
correct), and somehow it wasn't causing failures on current drivers.

b6678

03 Oct 10:19
ad12647
Compare
Choose a tag to compare
ci : change macos-13 to macos-15-intel (#16401)

This commit updates the macos-13 runners to macos-15-intel.

The motivation for this changes is the macos-13 runners are scheduled
to be retired on 2025-12-04.

Refs: https://github.blog/changelog/2025-09-19-github-actions-macos-13-runner-image-is-closing-down/

b6676

03 Oct 09:31
e308efd
Compare
Choose a tag to compare
vulkan: in flash attention, bounds check against nem1 (don't rely on …

b6673

02 Oct 19:32
d64c810
Compare
Choose a tag to compare
test-barrier : do not use more threads than physically available (#16…