Potential workaround for "cpy.cu: GGML_ASSERT(ggml_nbytes(src0) <= INT_MAX) failed" (CUDA/HIP) #1106

stduhpf · 2025-12-17T02:26:58Z

stduhpf
Dec 17, 2025

Sometimes, especially when generating or decoding long high resolution videos on CUDA or HIP backends, it crashes with this assert in ggml_cuda_cpy() kernel.

Out of frustration, I just commented out the asserts and somehow, it seems to run just fine.
Looking at the code, I don't immediately see what's the purpose of these asserts. So I'd say it might be safe to remove them if they're in your way.

Am I missing something? Should I just propose the change upstream?

CarlGao4 · 2026-01-09T07:44:43Z

CarlGao4
Jan 9, 2026

llama.cpp has encountered this issue when using Qwen3-Next-80B, so it has been fixed in ggml-org/llama.cpp#18433
Waiting ggml repo to follow up this change and then sd.cpp could update ggml so this would be fixed

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential workaround for "cpy.cu: GGML_ASSERT(ggml_nbytes(src0) <= INT_MAX) failed" (CUDA/HIP) #1106

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Potential workaround for "cpy.cu: GGML_ASSERT(ggml_nbytes(src0) <= INT_MAX) failed" (CUDA/HIP) #1106

Uh oh!

stduhpf Dec 17, 2025

Replies: 1 comment

Uh oh!

CarlGao4 Jan 9, 2026

stduhpf
Dec 17, 2025

CarlGao4
Jan 9, 2026