Releases · ggml-org/llama.cpp

09 May 16:58

17512a9

b5330

sycl : implementation of reordered Q4_0 MMVQ for Intel GPUs  (#12858)

* sycl : Implemented reorder Q4_0 mmvq

Signed-off-by: Alberto Cabrera <[email protected]>

* sycl : Fixed mmvq being called when reorder is disabled

* sycl : Improved comments in the quants header

Signed-off-by: Alberto Cabrera <[email protected]>

* Use static_assert

* safe_div -> ceil_div

* Clarify qi comment

* change the reorder tensor from init to execute OP

* dbg

* Undo changes to test-backend-ops

* Refactor changes on top of q4_0 reorder fix

* Missing Reverts

* Refactored opt_for_reorder logic to simplify code path

* Explicit inlining and unroll

* Renamed mul_mat_algo enum for consistency

---------

Signed-off-by: Alberto Cabrera <[email protected]>
Co-authored-by: romain.biessy <[email protected]>

Assets 20

09 May 16:38

github-actions

b5329

611aa91

b5329

metal : optimize MoE for large batches (#13388)

ggml-ci

Assets 20

09 May 13:28

github-actions

b5328

0cf6725

b5328

CUDA: FA support for Deepseek (Ampere or newer) (#13306)

* CUDA: FA support for Deepseek (Ampere or newer)

* do loop unrolling via C++ template

Assets 20

09 May 12:08

github-actions

b5327

27ebfca

b5327

llama : do not crash if there is no CPU backend (#13395)

* llama : do not crash if there is no CPU backend

* add checks to examples

Assets 20

09 May 11:45

github-actions

b5326

5c86c9e

b5326

CUDA: fix crash on large batch size for MoE models (#13384)

Assets 20

09 May 11:11

github-actions

b5325

efb8b47

b5325

imatrix : Add --parse-special for enabling parsing of special tokens …

Assets 20

09 May 11:10

github-actions

b5324

0527771

b5324

llama-run: add support for downloading models from ModelScope (#13370)

Signed-off-by: Xiaodong Ye <[email protected]>

Assets 20

09 May 11:01

github-actions

b5323

2189fd3

b5323

mtmd : fix batch_view for m-rope (#13397)

* mtmd : fix batch_view for m-rope

* nits : fix comment

Assets 20

09 May 10:08

github-actions

b5322

3f96aef

b5322

llama : one-off chat template fix for Mistral-Small-2503 (#13398)

* llama : one-off chat template fix for Mistral-Small-2503

* update readme

* add mistral-v7-tekken

Assets 20

09 May 08:43

github-actions

b5321

b486ba0

b5321

rpc : add rpc_msg_set_tensor_hash_req (#13353)

* rpc : add rpc_msg_set_tensor_hash_req

Use a dedicated struct for the request of RPC_CMD_SET_TENSOR_HASH which
makes the code cleaner.

* fix

Assets 20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: ggml-org/llama.cpp

b5330

Uh oh!

b5329

Uh oh!

b5328

Uh oh!

b5327

Uh oh!

b5326

Uh oh!

b5325

Uh oh!

b5324

Uh oh!

b5323

Uh oh!

b5322

Uh oh!

b5321

Uh oh!