Releases: ggml-org/llama.cpp
Releases · ggml-org/llama.cpp
b5330
sycl : implementation of reordered Q4_0 MMVQ for Intel GPUs (#12858) * sycl : Implemented reorder Q4_0 mmvq Signed-off-by: Alberto Cabrera <[email protected]> * sycl : Fixed mmvq being called when reorder is disabled * sycl : Improved comments in the quants header Signed-off-by: Alberto Cabrera <[email protected]> * Use static_assert * safe_div -> ceil_div * Clarify qi comment * change the reorder tensor from init to execute OP * dbg * Undo changes to test-backend-ops * Refactor changes on top of q4_0 reorder fix * Missing Reverts * Refactored opt_for_reorder logic to simplify code path * Explicit inlining and unroll * Renamed mul_mat_algo enum for consistency --------- Signed-off-by: Alberto Cabrera <[email protected]> Co-authored-by: romain.biessy <[email protected]>
b5329
metal : optimize MoE for large batches (#13388) ggml-ci
b5328
CUDA: FA support for Deepseek (Ampere or newer) (#13306) * CUDA: FA support for Deepseek (Ampere or newer) * do loop unrolling via C++ template
b5327
llama : do not crash if there is no CPU backend (#13395) * llama : do not crash if there is no CPU backend * add checks to examples
b5326
CUDA: fix crash on large batch size for MoE models (#13384)
b5325
imatrix : Add --parse-special for enabling parsing of special tokens …
b5324
llama-run: add support for downloading models from ModelScope (#13370) Signed-off-by: Xiaodong Ye <[email protected]>
b5323
mtmd : fix batch_view for m-rope (#13397) * mtmd : fix batch_view for m-rope * nits : fix comment
b5322
llama : one-off chat template fix for Mistral-Small-2503 (#13398) * llama : one-off chat template fix for Mistral-Small-2503 * update readme * add mistral-v7-tekken
b5321
rpc : add rpc_msg_set_tensor_hash_req (#13353) * rpc : add rpc_msg_set_tensor_hash_req Use a dedicated struct for the request of RPC_CMD_SET_TENSOR_HASH which makes the code cleaner. * fix