Skip to content

Releases: ggml-org/llama.cpp

b5460

22 May 23:26
3079e9a
Compare
Choose a tag to compare
release : fix windows hip release (#13707)

* release : fix windows hip release

* make single hip release with multiple targets

b5459

22 May 19:38
8a1d206
Compare
Choose a tag to compare
tts : fix n_ubatch + make WavTokenizer cache-less (#13713)

ggml-ci

b5458

22 May 19:02
797990c
Compare
Choose a tag to compare
mtmd : add ultravox audio input (#13623)

* convert ok, load ok

* warmup ok

* test

* still does not work?

* fix padding

* temporary give up

* fix merge conflict

* build_ultravox()

* rm test

* fix merge conflict

* add necessary mtmd APIs

* first working version (only 4s of audio)

* will this monster compile?

* fix compile

* please compile

* fPIC

* fix windows

* various fixes

* clean up audio_helpers

* fix conversion

* add some debug stuff

* long audio input ok

* adapt the api

* add --audio arg

* final touch UX

* add miniaudio to readme

* fix typo

* refactor kv metadata

* mtmd_default_marker()

b5456

22 May 14:15
cc74d5b
Compare
Choose a tag to compare
server : pad small embedding batches (#13692)

ggml-ci

b5454

22 May 12:51
d394a9a
Compare
Choose a tag to compare
sycl : Remove waits from function calls (#13702)

* removes the waits in async memcpy functions

b5453

22 May 09:01
6b56a64
Compare
Choose a tag to compare
SYCL: Avoid using with SYCL-Graph for unsupported nodes (#13587)

Currently on a CUDA backend to SYCL when running
`GGML_SYCL_DISABLE_GRAPH=0 ./bin/test-backend-ops -b SYCL0` there
are two operations that throw an exception from the blocking
waits during queue recording.

* `-o CONCAT` : Use of blocking waits on a queue that's being recorded https://github.com/ggml-org/llama.cpp/blob/master/ggml/src/ggml-sycl/concat.cpp#L185-L187
* `-o MUL_MAT_ID`: Blocking wait on a recording queue for a copy to host memory https://github.com/ggml-org/llama.cpp/blob/master/ggml/src/ggml-sycl/ggml-sycl.cpp#L3072-L3074

We've noticed that `ggml-cuda.cu` has the
[check_node_graph_compatibility_and_refresh_copy_ops](https://github.com/ggml-org/llama.cpp/blob/39e73ae0d69f882d7e29cecc6dd8f5052fca6731/ggml/src/ggml-cuda/ggml-cuda.cu#L2458-L2458)
method for checking if a graph can be used, even if enabled. I've taken a
similar approach in this PR by adding a method to `ggml-sycl.cpp` for checking
if a graph can be used for the operations even if a user has asked for it to be
enabled.

b5452

21 May 23:34
a4e8912
Compare
Choose a tag to compare
opencl: Add support for multiple devices (#12622)

* opencl: Add support for multiple devices

... but limited to one platform. A platform with a GPU will be preferred.

Additionally:

* Filter out devices that lack capabilities needed by the backend
  implementation (half support, OpenCL 2.0+, etc).

* Make ggml_backend_opencl_reg() thread-safe.

* fixup: fix an error in sync_with_other_backends

... when there is only one OpenCL device available.

b5451

21 May 20:56
edbf42e
Compare
Choose a tag to compare
opencl: fix couple crashes (#12795)

* opencl: fix couple crashes

* fix kernel launches failed on devices which do not support
  non-uniform work-groups. When non-uniform work-groups are not
  supported, set `local_work_size` to NULL (= let driver choose the
  work-group sizes). This patch does not cover everything - just the
  cases tested by test-backend-ops.

* fix sub-buffer creation failed due to `cl_buffer_region::origin` not
  being aligned to `CL_DEVICE_MEM_BASE_ADDR_ALIGN`.

* OpenCL: query non-uniform WG sizes only on OpenCL 3.0+

b5450

21 May 20:42
d643bb2
Compare
Choose a tag to compare
releases : build CPU backend separately (windows) (#13642)

b5449

21 May 17:57
8e186ef
Compare
Choose a tag to compare
hparams : support models for which all layers use SWA (#13682)

ggml-ci