-
Notifications
You must be signed in to change notification settings - Fork 12.7k
vulkan : add fp16 support for the conv_2d kernel #14872
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
2c62fd5
to
f0f7b73
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I haven't tested it locally, though.
tests/test-backend-ops.cpp
Outdated
for (auto act_case : cases) { | ||
test_cases.emplace_back(new test_conv_2d( | ||
{ act_case[iwh_idx], act_case[iwh_idx], act_case[Cin_idx], act_case[B_idx] }, | ||
{ act_case[kwh_idx], act_case[kwh_idx], act_case[Cin_idx], act_case[Cout_idx] }, 1, 1, 0, 0, 1, 1, false)); | ||
{ act_case[kwh_idx], act_case[kwh_idx], act_case[Cin_idx], act_case[Cout_idx] }, | ||
GGML_TYPE_F32, 1, 1, 0, 0, 1, 1, false)); | ||
test_cases.emplace_back(new test_conv_2d( | ||
{ act_case[iwh_idx], act_case[iwh_idx], act_case[Cin_idx], act_case[B_idx] }, | ||
{ act_case[kwh_idx], act_case[kwh_idx], act_case[Cin_idx], act_case[Cout_idx] }, | ||
GGML_TYPE_F16, 1, 1, 0, 0, 1, 1, false)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we repeat the same test for different formats we typically loop through a ggml_type instead.
llama.cpp/tests/test-backend-ops.cpp
Lines 4989 to 4996 in 793c0d7
// glu ops | |
for (ggml_type type : {GGML_TYPE_F16, GGML_TYPE_F32}) { | |
for (int v : {0, 1}) { | |
for (int op = 0; op < GGML_GLU_OP_COUNT; op++) { | |
for (bool swapped : {false, true}) { | |
test_cases.emplace_back(new test_glu((ggml_glu_op) op, type, { 128, 2, 2, 2 }, v, swapped)); | |
test_cases.emplace_back(new test_glu((ggml_glu_op) op, type, { 5, 7, 11, 13 }, v, swapped)); | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Also added the f16 tests to the normal tests, since they seem to run fast.
62cbfe3
to
4fa0331
Compare
Looks like the error in a few cases is just a little too much.
and here in ci:
|
The error does not seem to be deterministic, is that expected? (Just had only 2 cases surpass the error threshold) |
Maybe another case of RTE? Edit: No, just tried it, that does not resolve it. |
Yes, the test values are randomly generated. IIUC the kernel just promotes the fp16 values to fp32, nothing is done with fp16 math, so there ought not be any precision issues (or at least, no worse than with fp32). |
I did not look into the cpu code but their f16 impl might use f16 for intermediate values - that could explain the divergence. |
I think you're right. I see this:
If we eventually want to accelerate these operations using tensor cores then having the sources both in fp16 is what we'll want. So I think we should change the shader to convert the source values to fp16. |
I hacked in the cat to kernel type, but it still errors. The error is smaller though. eg
|
That change means the shader no longer works without fp16 compute support.
The usual assumption was that the CPU backend would do at least 32-bit precision, while GPU backends sacrifice precision for performance. This doesn't seem to be true here. I don't really see why better precision should cause failed tests, maybe the threshold should be increased slightly. We definitely need a 32-bit shader version just to support old devices. |
If you make the CPU implementation use FP32 only do the errors go away? |
Maybe the NMSE threshold is too strict? Fp16 has ~4 significant digits, and I assume that the cpu/Vulkan code do not execute the same calculations in the same order. So it might be useful to define a threshold that respects the number format. E.g. the closest fp16 number to 1/3 is 0.33325195 according to Wikipedia. If the numbers differ from the 5th digit or more the test should be accepted. |
@ggerganov @slaren Do you have an opinion on the test threshold? I don't want to reduce backend precision just to follow the CPU implementation. |
I guess for |
a7da6ac
to
d6c0382
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This enables you to run a fp16 sd1.x model with sd.cpp.
This is my first time touching the vulkan code, feedback appreciated.
Related discussions: leejet/stable-diffusion.cpp#739