vulkan : add fp16 support for the conv_2d kernel #14872

Green-Sky · 2025-07-25T09:45:45Z

This enables you to run a fp16 sd1.x model with sd.cpp.

This is my first time touching the vulkan code, feedback appreciated.

Related discussions: leejet/stable-diffusion.cpp#739

ggml/src/ggml-vulkan/ggml-vulkan.cpp

jeffbolznv

LGTM. I haven't tested it locally, though.

netrunnereve · 2025-07-25T16:59:28Z

tests/test-backend-ops.cpp

    for (auto act_case : cases) {
        test_cases.emplace_back(new test_conv_2d(
            { act_case[iwh_idx], act_case[iwh_idx], act_case[Cin_idx], act_case[B_idx] },
-            { act_case[kwh_idx], act_case[kwh_idx], act_case[Cin_idx], act_case[Cout_idx] }, 1, 1, 0, 0, 1, 1, false));
+            { act_case[kwh_idx], act_case[kwh_idx], act_case[Cin_idx], act_case[Cout_idx] },
+            GGML_TYPE_F32, 1, 1, 0, 0, 1, 1, false));
+        test_cases.emplace_back(new test_conv_2d(
+            { act_case[iwh_idx], act_case[iwh_idx], act_case[Cin_idx], act_case[B_idx] },
+            { act_case[kwh_idx], act_case[kwh_idx], act_case[Cin_idx], act_case[Cout_idx] },
+            GGML_TYPE_F16, 1, 1, 0, 0, 1, 1, false));


If we repeat the same test for different formats we typically loop through a ggml_type instead.

llama.cpp/tests/test-backend-ops.cpp

Lines 4989 to 4996 in 793c0d7

// glu ops

for (ggml_type type : {GGML_TYPE_F16, GGML_TYPE_F32}) {

for (int v : {0, 1}) {

for (int op = 0; op < GGML_GLU_OP_COUNT; op++) {

for (bool swapped : {false, true}) {

test_cases.emplace_back(new test_glu((ggml_glu_op) op, type, { 128, 2, 2, 2 }, v, swapped));

test_cases.emplace_back(new test_glu((ggml_glu_op) op, type, { 5, 7, 11, 13 }, v, swapped));

}

Done. Also added the f16 tests to the normal tests, since they seem to run fast.

Green-Sky · 2025-07-25T19:51:16Z

Looks like the error in a few cases is just a little too much.

$ bin/test-backend-ops test -o CONV_2D
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = NVIDIA GeForce RTX 2070 (NVIDIA) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 49152 | int dot: 0 | matrix cores: KHR_coopmat
Testing 2 devices

Backend 1/2: Vulkan0
  Device description: NVIDIA GeForce RTX 2070
  Device memory: 8192 MB (8192 MB free)

[CONV_2D] NMSE = 0.000000127 > 0.000000100   CONV_2D(ne_input=[1,1,1,2],ne_kernel=[1,1,1,12],type_kernel=f16,stride0=1,stride1=5,padding0=5,padding1=5,dilation0=2,dilation1=4,cwhn=0): FAIL
[CONV_2D] NMSE = 0.000000115 > 0.000000100   CONV_2D(ne_input=[1,1,1,2],ne_kernel=[2,1,1,12],type_kernel=f16,stride0=1,stride1=5,padding0=5,padding1=5,dilation0=2,dilation1=4,cwhn=0): FAIL
[CONV_2D] NMSE = 0.000000243 > 0.000000100   CONV_2D(ne_input=[1,1,25,2],ne_kernel=[1,2,25,1],type_kernel=f16,stride0=1,stride1=5,padding0=5,padding1=5,dilation0=2,dilation1=4,cwhn=0): FAIL

...

  8129/8132 tests passed
  Backend Vulkan0: FAIL
Backend 2/2: CPU
  Skipping CPU backend
1/2 backends passed
FAIL

and here in ci:

 ggml_vulkan: Found 1 Vulkan devices:
 ggml_vulkan: 0 = llvmpipe (LLVM 15.0.7, 256 bits) (llvmpipe) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 8 | shared memory: 32768 | int dot: 0 | matrix cores: none
 ggml_vulkan: Warning: Device type is CPU. This is probably not the device you want.
 Testing 2 devices
 
 Backend 1/2: Vulkan0
   Device description: llvmpipe (LLVM 15.0.7, 256 bits)
   Device memory: 15995 MB (15995 MB free)

[CONV_2D] NMSE = 0.000000193 > 0.000000100   CONV_2D(ne_input=[1,1,25,2],ne_kernel=[2,1,25,1],type_kernel=f16,stride0=1,stride1=5,padding0=5,padding1=5,dilation0=2,dilation1=4,cwhn=0): FAIL
[CONV_2D] NMSE = 0.000000294 > 0.000000100   CONV_2D(ne_input=[1,1,25,2],ne_kernel=[1,2,25,1],type_kernel=f16,stride0=1,stride1=5,padding0=5,padding1=5,dilation0=2,dilation1=4,cwhn=0): FAIL
[CONV_2D] NMSE = 0.000001574 > 0.000000100   CONV_2D(ne_input=[1,1,25,2],ne_kernel=[3,1,25,1],type_kernel=f16,stride0=3,stride1=5,padding0=5,padding1=5,dilation0=2,dilation1=4,cwhn=0): FAIL
[CONV_2D] NMSE = 0.000000101 > 0.000000100   CONV_2D(ne_input=[1,1,25,2],ne_kernel=[3,1,25,12],type_kernel=f16,stride0=3,stride1=5,padding0=5,padding1=5,dilation0=2,dilation1=4,cwhn=0): FAIL

   8128/8132 tests passed
   Backend Vulkan0: FAIL
 Backend 2/2: CPU
   Skipping CPU backend
 1/2 backends passed
 FAIL

Green-Sky · 2025-07-25T20:48:11Z

The error does not seem to be deterministic, is that expected?

(Just had only 2 cases surpass the error threshold)

0cc4m · 2025-07-25T20:52:12Z

Maybe another case of RTE?

Edit: No, just tried it, that does not resolve it.

jeffbolznv · 2025-07-25T23:40:23Z

The error does not seem to be deterministic, is that expected?

Yes, the test values are randomly generated.

IIUC the kernel just promotes the fp16 values to fp32, nothing is done with fp16 math, so there ought not be any precision issues (or at least, no worse than with fp32).

etasnadi · 2025-07-26T00:32:21Z

The error does not seem to be deterministic, is that expected?

(Just had only 2 cases surpass the error threshold)

I did not look into the cpu code but their f16 impl might use f16 for intermediate values - that could explain the divergence.

jeffbolznv · 2025-07-26T02:00:54Z

I think you're right. I see this:

                        if (kernel_type == GGML_TYPE_F32) {
                            *(float *) element_ptr = src_val;
                        } else if (kernel_type == GGML_TYPE_F16) {
                            *(ggml_fp16_t *) element_ptr = GGML_CPU_FP32_TO_FP16(src_val);
                        }

If we eventually want to accelerate these operations using tensor cores then having the sources both in fp16 is what we'll want. So I think we should change the shader to convert the source values to fp16.

Green-Sky · 2025-07-26T12:21:59Z

I hacked in the cat to kernel type, but it still errors. The error is smaller though.
edit: ci agrees

eg

[CONV_2D] NMSE = 0.000000130 > 0.000000100   CONV_2D(ne_input=[1,1,1,2],ne_kernel=[1,1,1,1],type_kernel=f16,stride0=1,stride1=5,padding0=5,padding1=5,dilation0=2,dilation1=4,cwhn=0): FAIL
[CONV_2D] NMSE = 0.000000105 > 0.000000100   CONV_2D(ne_input=[1,1,25,2],ne_kernel=[2,2,25,1],type_kernel=f16,stride0=1,stride1=5,padding0=5,padding1=5,dilation0=2,dilation1=4,cwhn=0): FAIL

0cc4m · 2025-07-26T14:22:10Z

That change means the shader no longer works without fp16 compute support.

If we eventually want to accelerate these operations using tensor cores then having the sources both in fp16 is what we'll want. So I think we should change the shader to convert the source values to fp16.

The usual assumption was that the CPU backend would do at least 32-bit precision, while GPU backends sacrifice precision for performance. This doesn't seem to be true here. I don't really see why better precision should cause failed tests, maybe the threshold should be increased slightly. We definitely need a 32-bit shader version just to support old devices.

netrunnereve · 2025-07-26T15:07:00Z

If you make the CPU implementation use FP32 only do the errors go away?

etasnadi · 2025-07-27T03:29:40Z

I hacked in the cat to kernel type, but it still errors. The error is smaller though. edit: ci agrees

eg

[CONV_2D] NMSE = 0.000000130 > 0.000000100   CONV_2D(ne_input=[1,1,1,2],ne_kernel=[1,1,1,1],type_kernel=f16,stride0=1,stride1=5,padding0=5,padding1=5,dilation0=2,dilation1=4,cwhn=0): FAIL
[CONV_2D] NMSE = 0.000000105 > 0.000000100   CONV_2D(ne_input=[1,1,25,2],ne_kernel=[2,2,25,1],type_kernel=f16,stride0=1,stride1=5,padding0=5,padding1=5,dilation0=2,dilation1=4,cwhn=0): FAIL

Maybe the NMSE threshold is too strict? Fp16 has ~4 significant digits, and I assume that the cpu/Vulkan code do not execute the same calculations in the same order. So it might be useful to define a threshold that respects the number format. E.g. the closest fp16 number to 1/3 is 0.33325195 according to Wikipedia. If the numbers differ from the 5th digit or more the test should be accepted.

0cc4m · 2025-07-27T07:46:06Z

@ggerganov @slaren Do you have an opinion on the test threshold? I don't want to reduce backend precision just to follow the CPU implementation.

ggerganov · 2025-07-27T08:34:34Z

I guess for test_conv_2d we can set the same max NMSE as for test_mul_mat: 5e-4.

0cc4m

LGTM

github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Jul 25, 2025

Green-Sky mentioned this pull request Jul 25, 2025

GGML direct conv2d support leejet/stable-diffusion.cpp#739

Open

vulkan : add fp16 support for the conv_2d kernel

f0f7b73

Green-Sky force-pushed the vk_conv2d_fp16_knl branch from 2c62fd5 to f0f7b73 Compare July 25, 2025 09:52

jeffbolznv reviewed Jul 25, 2025

View reviewed changes

ggml/src/ggml-vulkan/ggml-vulkan.cpp Show resolved Hide resolved

fix reported type

74fc81f

jeffbolznv approved these changes Jul 25, 2025

View reviewed changes

Green-Sky marked this pull request as ready for review July 25, 2025 13:46

Green-Sky requested a review from 0cc4m as a code owner July 25, 2025 13:46

github-actions bot added the testing Everything test related label Jul 25, 2025

netrunnereve reviewed Jul 25, 2025

View reviewed changes

add f16 to conv_2d testing

4fa0331

Green-Sky force-pushed the vk_conv2d_fp16_knl branch from 62cbfe3 to 4fa0331 Compare July 25, 2025 18:45

weaken conv2d test error threshold

d6c0382

Green-Sky force-pushed the vk_conv2d_fp16_knl branch from a7da6ac to d6c0382 Compare July 27, 2025 08:42

0cc4m approved these changes Jul 27, 2025

View reviewed changes

Green-Sky merged commit 89d1029 into ggml-org:master Jul 27, 2025
47 checks passed

	// glu ops
	for (ggml_type type : {GGML_TYPE_F16, GGML_TYPE_F32}) {
	for (int v : {0, 1}) {
	for (int op = 0; op < GGML_GLU_OP_COUNT; op++) {
	for (bool swapped : {false, true}) {
	test_cases.emplace_back(new test_glu((ggml_glu_op) op, type, { 128, 2, 2, 2 }, v, swapped));
	test_cases.emplace_back(new test_glu((ggml_glu_op) op, type, { 5, 7, 11, 13 }, v, swapped));
	}

vulkan : add fp16 support for the conv_2d kernel #14872

vulkan : add fp16 support for the conv_2d kernel #14872

Uh oh!

Conversation

Green-Sky commented Jul 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

jeffbolznv left a comment

Choose a reason for hiding this comment

Uh oh!

netrunnereve Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

Green-Sky Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

Green-Sky commented Jul 25, 2025

Uh oh!

Green-Sky commented Jul 25, 2025

Uh oh!

0cc4m commented Jul 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeffbolznv commented Jul 25, 2025

Uh oh!

etasnadi commented Jul 26, 2025

Uh oh!

jeffbolznv commented Jul 26, 2025

Uh oh!

Green-Sky commented Jul 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

0cc4m commented Jul 26, 2025

Uh oh!

netrunnereve commented Jul 26, 2025

Uh oh!

etasnadi commented Jul 27, 2025

Uh oh!

0cc4m commented Jul 27, 2025

Uh oh!

ggerganov commented Jul 27, 2025

Uh oh!

0cc4m left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Green-Sky commented Jul 25, 2025 •

edited

Loading

0cc4m commented Jul 25, 2025 •

edited

Loading

Green-Sky commented Jul 26, 2025 •

edited

Loading