Releases: ggml-org/llama.cpp
Releases · ggml-org/llama.cpp
b6686
b6685
server : context checkpointing for hybrid and recurrent models (#16382) * initial commit for branch 3 * generalize `swa_checkpoint` to `ctx_checkpoint` this extends `llama-server`'s SWA checkpointing logic to include hybrid/recurrent models such as Jamba, Granite * oops * disable debug prints * keep backwards compat with `--swa-checkpoints` Co-authored-by: Georgi Gerganov <[email protected]> * update prompt re-processing message * fix off-by-one error per GG * keep `seq_rm` log per GG Co-authored-by: Georgi Gerganov <[email protected]> * server : fix checkpoint logic to support recurrent caches * server : cleanup and fixes --------- Co-authored-by: Georgi Gerganov <[email protected]>
b6684
metal : fix loop bound in ggml_mem_ranges (#16412)
b6683
llama : fix shapes for bert/mpt q/k norm (#16409)
b6682
ggml : fix graph reallocation with multiple chunks (#16396) reallocation is needed if a single chunk grows in size, even if total allocation size stays the same or is lower
b6680
vulkan: Replace uses of maxMemoryAllocationSize and VK_WHOLE_SIZE (#1…
b6679
vulkan: Fix FA coopmat1 invalid array indexing (#16365) When computing sinks, the cm1 shader was looping r from 0 to Br rather than to rows_per_thread. I must have copied this from the scalar path (where it is correct), and somehow it wasn't causing failures on current drivers.
b6678
ci : change macos-13 to macos-15-intel (#16401) This commit updates the macos-13 runners to macos-15-intel. The motivation for this changes is the macos-13 runners are scheduled to be retired on 2025-12-04. Refs: https://github.blog/changelog/2025-09-19-github-actions-macos-13-runner-image-is-closing-down/
b6676
vulkan: in flash attention, bounds check against nem1 (don't rely on …
b6673
test-barrier : do not use more threads than physically available (#16…