Replies: 1 comment
-
|
llama.cpp has encountered this issue when using Qwen3-Next-80B, so it has been fixed in ggml-org/llama.cpp#18433 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Sometimes, especially when generating or decoding long high resolution videos on CUDA or HIP backends, it crashes with this assert in ggml_cuda_cpy() kernel.
Out of frustration, I just commented out the asserts and somehow, it seems to run just fine.
Looking at the code, I don't immediately see what's the purpose of these asserts. So I'd say it might be safe to remove them if they're in your way.
Am I missing something? Should I just propose the change upstream?
Beta Was this translation helpful? Give feedback.
All reactions