Releases: ggml-org/llama.cpp
Releases · ggml-org/llama.cpp
b5429
kv-cache : add SWA support (#13194) * kv-cache : prepare for SWA ggml-ci * kv-cache : initial iSWA implementation ggml-ci * kv-cache : rework error recovery logic ggml-ci * models : fix Phi-3 SWA parameters ggml-ci * model : adjust Granite to rope factor changes ggml-ci * server : check if context can do shifts ggml-ci * iswa : for now, always enable shifts (experiment) ggml-ci * kv-cache : simplify SWA logic ggml-ci * kv-cache : apply defrag when we fail to find slots for the batch ggml-ci * llama : update docs about llama_decode ggml-ci * kv-cache : update warning logs when no space for the batch is available ggml-ci * llama : add llama_kv_self_seq_pos_min() * kv-cache : keep track of partial SWA computes and print warnings * server : disallow use cases involving partial SWA context ggml-ci * llama : add param to control SWA cache size ggml-ci * minor : clean-up ggml-ci
b5427
sycl : Overcoming workaround for mmap() allocation on Windows (#13482) * Remove mmap workaround on windows After some testing I found that mmap is supported on windows and for many GPUs on Linux. Therefore I remove the workaround for windows since it is not necessary. * Update llama-bench README SYCL backend introduced a workaround that allows execution of llama-bench also without specifying `--mmp 0` flag
b5426
common : add load_progress_callback (#13617)
b5425
Vulkan: Add f32 accumulator support to quantized mul mat to fix GLM4 …
b5423
mtmd : add vision support for llama 4 (#13282) * wip llama 4 conversion * rm redundant __init__ * fix conversion * fix conversion * test impl * try this * reshape patch_embeddings_0 * fix view * rm ffn_post_norm * cgraph ok * f32 for pos embd * add image marker tokens * Llama4UnfoldConvolution * correct pixel shuffle * fix merge conflicts * correct * add debug_graph * logits matched, but it still preceives the image incorrectly * fix style * add image_grid_pinpoints * handle llama 4 preprocessing * rm load_image_size * rm unused line * fix * small fix 2 * add test & docs * fix llava-1.6 test * test: add notion of huge models * add comment * add warn about degraded quality
b5422
ci : upgraded oneAPI version in SYCL workflows and dockerfile (#13532)
b5421
sync : ggml ggml-ci
b5417
fix: check model pointer validity before use (#13631)
b5416
CANN: Support MOE Model MUL_MAT_ID (#13042) Signed-off-by: noemotiovon <[email protected]>
b5415
server : added --no-prefill-assistant flag (#13608) * added no-prefill-assistant flag * reworded documentation comment * updated server README.md