Skip to content

Releases: ggml-org/llama.cpp

b5330

09 May 16:58
17512a9
Compare
Choose a tag to compare
sycl : implementation of reordered Q4_0 MMVQ for Intel GPUs  (#12858)

* sycl : Implemented reorder Q4_0 mmvq

Signed-off-by: Alberto Cabrera <[email protected]>

* sycl : Fixed mmvq being called when reorder is disabled

* sycl : Improved comments in the quants header

Signed-off-by: Alberto Cabrera <[email protected]>

* Use static_assert

* safe_div -> ceil_div

* Clarify qi comment

* change the reorder tensor from init to execute OP

* dbg

* Undo changes to test-backend-ops

* Refactor changes on top of q4_0 reorder fix

* Missing Reverts

* Refactored opt_for_reorder logic to simplify code path

* Explicit inlining and unroll

* Renamed mul_mat_algo enum for consistency

---------

Signed-off-by: Alberto Cabrera <[email protected]>
Co-authored-by: romain.biessy <[email protected]>

b5329

09 May 16:38
611aa91
Compare
Choose a tag to compare
metal : optimize MoE for large batches (#13388)

ggml-ci

b5328

09 May 13:28
0cf6725
Compare
Choose a tag to compare
CUDA: FA support for Deepseek (Ampere or newer) (#13306)

* CUDA: FA support for Deepseek (Ampere or newer)

* do loop unrolling via C++ template

b5327

09 May 12:08
27ebfca
Compare
Choose a tag to compare
llama : do not crash if there is no CPU backend (#13395)

* llama : do not crash if there is no CPU backend

* add checks to examples

b5326

09 May 11:45
5c86c9e
Compare
Choose a tag to compare
CUDA: fix crash on large batch size for MoE models (#13384)

b5325

09 May 11:11
efb8b47
Compare
Choose a tag to compare
imatrix : Add --parse-special for enabling parsing of special tokens …

b5324

09 May 11:10
0527771
Compare
Choose a tag to compare
llama-run: add support for downloading models from ModelScope (#13370)

Signed-off-by: Xiaodong Ye <[email protected]>

b5323

09 May 11:01
2189fd3
Compare
Choose a tag to compare
mtmd : fix batch_view for m-rope (#13397)

* mtmd : fix batch_view for m-rope

* nits : fix comment

b5322

09 May 10:08
3f96aef
Compare
Choose a tag to compare
llama : one-off chat template fix for Mistral-Small-2503 (#13398)

* llama : one-off chat template fix for Mistral-Small-2503

* update readme

* add mistral-v7-tekken

b5321

09 May 08:43
b486ba0
Compare
Choose a tag to compare
rpc : add rpc_msg_set_tensor_hash_req (#13353)

* rpc : add rpc_msg_set_tensor_hash_req

Use a dedicated struct for the request of RPC_CMD_SET_TENSOR_HASH which
makes the code cleaner.

* fix