You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- By default, NNPA is disabled by default. To enable it:
46
-
47
-
```bash
48
-
cmake -S . -B build \
49
-
-DCMAKE_BUILD_TYPE=Release \
50
-
-DGGML_BLAS=ON \
51
-
-DGGML_BLAS_VENDOR=OpenBLAS \
52
-
-DGGML_NNPA=ON
53
-
54
-
cmake --build build --config Release -j $(nproc)
55
-
```
56
-
57
45
- For debug builds:
58
46
59
47
```bash
@@ -164,15 +152,11 @@ All models need to be converted to Big-Endian. You can achieve this in three cas
164
152
165
153
Only available in IBM z15/LinuxONE 3 or later system with the `-DGGML_VXE=ON` (turned on by default) compile flag. No hardware acceleration is possible with llama.cpp with older systems, such as IBM z14/arch12. In such systems, the APIs can still run but will use a scalar implementation.
166
154
167
-
### 2. NNPA Vector Intrinsics Acceleration
168
-
169
-
Only available in IBM z16/LinuxONE 4 or later system with the `-DGGML_NNPA=ON` (turned off by default) compile flag. No hardware acceleration is possible with llama.cpp with older systems, such as IBM z15/arch13. In such systems, the APIs can still run but will use a scalar implementation.
170
-
171
-
### 3. zDNN Accelerator (WIP)
155
+
### 2. zDNN Accelerator (WIP)
172
156
173
157
Only available in IBM z17/LinuxONE 5 or later system with the `-DGGML_ZDNN=ON` compile flag. No hardware acceleration is possible with llama.cpp with older systems, such as IBM z15/arch13. In such systems, the APIs will default back to CPU routines.
174
158
175
-
### 4. Spyre Accelerator
159
+
### 3. Spyre Accelerator
176
160
177
161
_Only available with IBM z17 / LinuxONE 5 or later system. No support currently available._
178
162
@@ -230,10 +214,6 @@ IBM VXE/VXE2 SIMD acceleration depends on the BLAS implementation. It is strongl
Answer: We are aware of this as detailed in [this issue](https://github.com/ggml-org/llama.cpp/issues/14877). Please either try reducing the number of threads, or disable the compile option using `-DGGML_NNPA=OFF`.
236
-
237
217
## Getting Help on IBM Z & LinuxONE
238
218
239
219
1. **Bugs, Feature Requests**
@@ -258,38 +238,38 @@ IBM VXE/VXE2 SIMD acceleration depends on the BLAS implementation. It is strongl
258
238
259
239
## Appendix B: SIMD Support Matrix
260
240
261
-
|| VX/VXE/VXE2 |NNPA |zDNN | Spyre |
262
-
|----------| -----------|----|----|-----|
263
-
| FP32 | ✅ | ✅ | ✅ | ❓ |
264
-
| FP16 | ✅ | ✅ | ❓ | ❓ |
265
-
| BF16 | 🚫 | 🚫 | ❓ | ❓ |
266
-
| Q4_0 | ✅ | ✅ | ❓ | ❓ |
267
-
| Q4_1 | ✅ | ✅ | ❓ | ❓ |
268
-
| MXFP4 | 🚫 | 🚫 | ❓ | ❓ |
269
-
| Q5_0 | ✅ | ✅ | ❓ | ❓ |
270
-
| Q5_1 | ✅ | ✅ | ❓ | ❓ |
271
-
| Q8_0 | ✅ | ✅ | ❓ | ❓ |
272
-
| Q2_K | 🚫 | 🚫 | ❓ | ❓ |
273
-
| Q3_K | ✅ | ✅ | ❓ | ❓ |
274
-
| Q4_K | ✅ | ✅ | ❓ | ❓ |
275
-
| Q5_K | ✅ | ✅ | ❓ | ❓ |
276
-
| Q6_K | ✅ | ✅ | ❓ | ❓ |
277
-
| TQ1_0 | 🚫 | 🚫 | ❓ | ❓ |
278
-
| TQ2_0 | 🚫 | 🚫 | ❓ | ❓ |
279
-
| IQ2_XXS | 🚫 | 🚫 | ❓ | ❓ |
280
-
| IQ2_XS | 🚫 | 🚫 | ❓ | ❓ |
281
-
| IQ2_S | 🚫 | 🚫 | ❓ | ❓ |
282
-
| IQ3_XXS | 🚫 | 🚫 | ❓ | ❓ |
283
-
| IQ3_S | 🚫 | 🚫 | ❓ | ❓ |
284
-
| IQ1_S | 🚫 | 🚫 | ❓ | ❓ |
285
-
| IQ1_M | 🚫 | 🚫 | ❓ | ❓ |
286
-
| IQ4_NL | ✅ | ✅ | ❓ | ❓ |
287
-
| IQ4_XS | ✅ | ✅ | ❓ | ❓ |
288
-
| FP32->FP16 | 🚫 | ✅ | ❓ | ❓ |
289
-
| FP16->FP32 | 🚫 | ✅ | ❓ | ❓ |
241
+
|| VX/VXE/VXE2 | zDNN | Spyre |
242
+
|------------|-------------|------|-------|
243
+
| FP32 | ✅ | ✅ | ❓|
244
+
| FP16 | ✅ | ❓ | ❓|
245
+
| BF16 | 🚫 | ❓ | ❓|
246
+
| Q4_0 | ✅ | ❓ | ❓|
247
+
| Q4_1 | ✅ | ❓ | ❓|
248
+
| MXFP4 | 🚫 | ❓ | ❓|
249
+
| Q5_0 | ✅ | ❓ | ❓|
250
+
| Q5_1 | ✅ | ❓ | ❓|
251
+
| Q8_0 | ✅ | ❓ | ❓|
252
+
| Q2_K | 🚫 | ❓ | ❓|
253
+
| Q3_K | ✅ | ❓ | ❓|
254
+
| Q4_K | ✅ | ❓ | ❓|
255
+
| Q5_K | ✅ | ❓ | ❓|
256
+
| Q6_K | ✅ | ❓ | ❓|
257
+
| TQ1_0 | 🚫 | ❓ | ❓|
258
+
| TQ2_0 | 🚫 | ❓ | ❓|
259
+
| IQ2_XXS | 🚫 | ❓ | ❓|
260
+
| IQ2_XS | 🚫 | ❓ | ❓|
261
+
| IQ2_S | 🚫 | ❓ | ❓|
262
+
| IQ3_XXS | 🚫 | ❓ | ❓|
263
+
| IQ3_S | 🚫 | ❓ | ❓|
264
+
| IQ1_S | 🚫 | ❓ | ❓|
265
+
| IQ1_M | 🚫 | ❓ | ❓|
266
+
| IQ4_NL | ✅ | ❓ | ❓|
267
+
| IQ4_XS | ✅ | ❓ | ❓|
268
+
| FP32->FP16 | 🚫 | ❓ | ❓|
269
+
| FP16->FP32 | 🚫 | ❓ | ❓|
290
270
291
271
- ✅ - acceleration available
292
272
- 🚫 - acceleration unavailable, will still run using scalar implementation
293
273
- ❓ - acceleration unknown, please contribute if you can test it yourself
0 commit comments