There are two problems, 1: VAE crash, 2:Performance Tuning

**1.VAE crash:**

log:

➜  ~ /data/data/com.termux/files/home/llama/bin/sd.sh
ggml_opencl: selected platform: 'QUALCOMM Snapdragon(TM)'

ggml_opencl: device: 'QUALCOMM Adreno(TM) 830 (OpenCL 3.0 Adreno(TM) 830)'
ggml_opencl: OpenCL driver: OpenCL 3.0 QUALCOMM build: 0800.46 Compiler E031.47.18.23
ggml_opencl: vector subgroup broadcast support: true
ggml_opencl: device FP16 support: true
ggml_opencl: mem base addr align: 128
ggml_opencl: max mem alloc size: 1024 MB
ggml_opencl: SVM coarse grain buffer support: true
ggml_opencl: SVM fine grain buffer support: true
ggml_opencl: SVM fine grain system support: false
ggml_opencl: SVM atomics support: true
ggml_opencl: flattening quantized weights representation as struct of arrays (GGML_OPENCL_SOA_Q)
ggml_opencl: using kernels optimized for Adreno (GGML_OPENCL_USE_ADRENO_KERNELS)
ggml_opencl: loading OpenCL kernels.........................................................
ggml_opencl: default device: 'QUALCOMM Adreno(TM) 830 (OpenCL 3.0 Adreno(TM) 830)'
[INFO ] stable-diffusion.cpp:192  - loading model from '/storage/emulated/0/Download/1dm/picxReal_10.safetensors'
[INFO ] model.cpp:1013 - load /storage/emulated/0/Download/1dm/picxReal_10.safetensors using safetensors format
[INFO ] stable-diffusion.cpp:231  - loading vae from '/storage/emulated/0/Download/1dm/color101VAE_v1.safetensors'
[INFO ] model.cpp:1013 - load /storage/emulated/0/Download/1dm/color101VAE_v1.safetensors using safetensors format
[INFO ] stable-diffusion.cpp:243  - Version: SD 1.x
[INFO ] stable-diffusion.cpp:277  - Weight type:                 f16
[INFO ] stable-diffusion.cpp:278  - Conditioner weight type:     f16
[INFO ] stable-diffusion.cpp:279  - Diffusion model weight type: f16
[INFO ] stable-diffusion.cpp:280  - VAE weight type:             f16
  |==================================================| 1131/1131 - 500.00it/s
  |===================>                              | 443/1131 - 0.00it/s
[INFO ] stable-diffusion.cpp:558  - total params memory size = 2042.16MB (VRAM 2042.16MB, RAM 0.00MB): clip 307.44MB(VRAM), unet 1640.25MB(VRAM), vae 94.47MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM)
[INFO ] stable-diffusion.cpp:562  - loading model from '/storage/emulated/0/Download/1dm/picxReal_10.safetensors' completed, taking 1.47s
[INFO ] stable-diffusion.cpp:604  - running in eps-prediction mode
[INFO ] stable-diffusion.cpp:2017 - TXT2IMG
[INFO ] stable-diffusion.cpp:738  - Attempting to apply 0 LoRAs
[INFO ] stable-diffusion.cpp:1562 - apply_loras completed, taking 0.00s
[INFO ] stable-diffusion.cpp:1696 - get_learned_condition completed, taking 150 ms
[INFO ] stable-diffusion.cpp:1719 - sampling using DPM++ (2M) method
[INFO ] stable-diffusion.cpp:1768 - generating image: 1/1 - seed 42
  |==================================================| 2/2 - 26.07s/it
[INFO ] stable-diffusion.cpp:1806 - sampling completed, taking 52.28s
[INFO ] stable-diffusion.cpp:1814 - generating 1 latent images completed, taking 52.41s
[INFO ] stable-diffusion.cpp:1817 - decoding 1 latents
ggml_backend_opencl_buffer_type_alloc_buffer: requested size 1664.00 MiB exceeds max allocation size 1024.00 MiB, fallback to host RAM
ggml_gallocr_reserve_n: failed to allocate OpenCL buffer of size 1744830464
[ERROR] ggml_extend.hpp:1161 - vae: failed to allocate the compute buffer

/data/data/com.termux/files/home/llama/bin/sd.sh: line 15: 27015 Segmentation fault         /data/data/com.termux/files/home/llama/bin/sd -m /storage/emulated/0/Download/1dm/picxReal_10.safetensors --verbose --type f16 --width 512 --height 512 --steps 2 --vae /storage/emulated/0/Download/1dm/color101VAE_v1.safetensors -t 5 --cfg-scale 7.5 --sampling-method dpm++2m --prompt "girl" --negative-prompt "bad proportions, extra limbs, extra legs, extra legs and arms, disfigured, missing arms, simple background, transparent background, copyright, worst aesthetic, old, early, blurry, lowres, signature, artist name, watermark, twitter username, sketch, logo, furry, text, speech bubble, censored,angry boy,  girl smiling, boy shaded face " -o "/data/data/com.termux/files/home/sd_$(date +%s)_$RANDOM.png"


**reason:**

AI:

The problem you're encountering primarily involves an OpenCL memory allocation failure— ggml_backend_opencl_buffer_type_alloc_buffer: failed to allocate 1664.00 MiB. Even if you have sufficient system memory (RAM) and zRAM, you'll still get this error. This is because:

Cause Summary:
GPU memory ≠ system memory
Qualcomm Adreno GPUs (such as your 830) have a fixed memory limit, typically 1GB to 2GB. Even with a large amount of system RAM, OpenCL's max mem alloc size limit cannot be exceeded.

OpenCL limits the size of a single buffer. Your log entry states:

ggml_opencl: max mem alloc size: 1024 MB

This means that any buffer cannot exceed 1024 MB. However, the attempt to allocate:

failed to allocate 1664.00 MiB
exceeds this limit.

 Video memory isn't allocated as needed; it must be allocated in chunks.
Qualcomm GPUs can't flexibly allocate small chunks like desktop NVIDIA does. GGML requires allocating a whole chunk (e.g., 1664MB) for VAEs at once. Failure to do so will result in a crash.

Solution:

Allocate large buffers in chunks

**2.Performance Tuning:**

Because the application does not limit the GPU utilization, this is not a good thing. It will exhaust the system's graphics resources and cause the system interface to crash. So can we use `-ngl` and other methods to limit the resource consumption on the GPU like llama.cpp?

![Image](https://github.com/user-attachments/assets/724a4b84-3243-4ce7-8465-264d633f32bf)


@leejet 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

There are two problems, 1: VAE crash, 2:Performance Tuning #747

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

There are two problems, 1: VAE crash, 2:Performance Tuning #747

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions