Skip to content

[ET-VK] Segmentation Faults when running Vulkan Backend on certain Qualcomm chipsets #17557

@SS-JIA

Description

@SS-JIA

As title, vkCreateComputePipelines triggers a segmentation fault in the driver code when creating compute pipelines for certain shaders on Samsung S23.

Stack trace:

02-12 15:17:01.627 13994 13994 F DEBUG   : Cmdline: /data/local/tmp/etvk/execute_bpte /data/local/tmp/etvk/models/scenex_v9_512_vulkan_fp16.bpte
02-12 15:17:01.627 13994 13994 F DEBUG   : pid: 13984, tid: 13984, name: execute_bpte  >>> /data/local/tmp/etvk/execute_bpte <<<
02-12 15:17:01.627 13994 13994 F DEBUG   : uid: 2000
02-12 15:17:01.627 13994 13994 F DEBUG   : tagged_addr_ctrl: 0000000000000001 (PR_TAGGED_ADDR_ENABLE)
02-12 15:17:01.627 13994 13994 F DEBUG   : pac_enabled_keys: 000000000000000f (PR_PAC_APIAKEY, PR_PAC_APIBKEY, PR_PAC_APDAKEY, PR_PAC_APDBKEY)
02-12 15:17:01.627 13994 13994 F DEBUG   : signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x0000000000000010
02-12 15:17:01.627 13994 13994 F DEBUG   : Cause: null pointer dereference
02-12 15:17:01.627 13994 13994 F DEBUG   :     x0  0000000000000000  x1  0000007fdfced698  x2  0000007fdfced6a0  x3  b40000734a6b3bc0
02-12 15:17:01.627 13994 13994 F DEBUG   :     x4  0000000000000000  x5  f291713a99c3f5e2  x6  b4000072c9e4f8e0  x7  0000000000000000
02-12 15:17:01.627 13994 13994 F DEBUG   :     x8  0000007fdfced6a0  x9  b40000734a69f400  x10 0000000000000000  x11 0000007fdfced720
02-12 15:17:01.627 13994 13994 F DEBUG   :     x12 0000000000000001  x13 0000000000000002  x14 000000000000002b  x15 0000000000000000
02-12 15:17:01.627 13994 13994 F DEBUG   :     x16 0000000000000022  x17 0000000000000001  x18 000000734e580000  x19 0000000000000000
02-12 15:17:01.627 13994 13994 F DEBUG   :     x20 b40000734a6b3bc0  x21 b4000072c9f7cca0  x22 0000007fdfced698  x23 0000000000000006
02-12 15:17:01.627 13994 13994 F DEBUG   :     x24 000000734dfbb000  x25 000000000000467e  x26 000000734dfbb000  x27 0000000000000015
02-12 15:17:01.627 13994 13994 F DEBUG   :     x28 0000000000000001  x29 0000007fdfced7f0
02-12 15:17:01.627 13994 13994 F DEBUG   :     lr  006c7872b74cf4cc  sp  0000007fdfced5e0  pc  00000072b73c19f4  pst 0000000060001000
02-12 15:17:01.627 13994 13994 F DEBUG   : 34 total frames
02-12 15:17:01.627 13994 13994 F DEBUG   : backtrace:
02-12 15:17:01.627 13994 13994 F DEBUG   :   NOTE: Function names and BuildId information is missing for some frames due
02-12 15:17:01.627 13994 13994 F DEBUG   :   NOTE: to unreadable libraries. For unwinds of apps, only shared libraries
02-12 15:17:01.627 13994 13994 F DEBUG   :   NOTE: found under the lib/ directory are readable.
02-12 15:17:01.627 13994 13994 F DEBUG   :   NOTE: On this device, run setenforce 0 to make the libraries readable.
02-12 15:17:01.627 13994 13994 F DEBUG   :   NOTE: Unreadable libraries:
02-12 15:17:01.627 13994 13994 F DEBUG   :   NOTE:   /data/local/tmp/etvk/execute_bpte
02-12 15:17:01.627 13994 13994 F DEBUG   :       #00 pc 00000000005bf9f4  /vendor/lib64/libllvm-qgl.so (!!!0000!94f922484c7c2a123d83cc9a7b0fc0!dc3d4da3a2!+84) (BuildId: 197773a235861a62fb29a17d08291e53)
02-12 15:17:01.627 13994 13994 F DEBUG   :       #01 pc 00000000006cd4c8  /vendor/lib64/libllvm-qgl.so (!!!0000!8aa9316cfa40ea8e13922cfdcda509!dc3d4da3a2!+120) (BuildId: 197773a235861a62fb29a17d08291e53)
02-12 15:17:01.627 13994 13994 F DEBUG   :       #02 pc 00000000006ccdd4  /vendor/lib64/libllvm-qgl.so (!!!0000!99fd46ca6897ca43f4eedd7822487a!dc3d4da3a2!+436) (BuildId: 197773a235861a62fb29a17d08291e53)
02-12 15:17:01.627 13994 13994 F DEBUG   :       #03 pc 0000000000910934  /vendor/lib64/libllvm-qgl.so (!!!0000!866bd28e17dc06a823006799f7570e!dc3d4da3a2!+532) (BuildId: 197773a235861a62fb29a17d08291e53)
02-12 15:17:01.627 13994 13994 F DEBUG   :       #04 pc 000000000090ccdc  /vendor/lib64/libllvm-qgl.so (!!!0000!2a7897fa7e385f84d70f5d88ea5046!dc3d4da3a2!+2508) (BuildId: 197773a235861a62fb29a17d08291e53)
02-12 15:17:01.627 13994 13994 F DEBUG   :       #05 pc 0000000000d68a64  /vendor/lib64/libllvm-qgl.so (!!!0000!4ee45c73f202da09ceb9e97299e78c!dc3d4da3a2!+724) (BuildId: 197773a235861a62fb29a17d08291e53)
02-12 15:17:01.627 13994 13994 F DEBUG   :       #06 pc 00000000006e41c8  /vendor/lib64/libllvm-qgl.so (!!!0000!e39e8cf324350f3c5a7f77e6d95208!dc3d4da3a2!+472) (BuildId: 197773a235861a62fb29a17d08291e53)
02-12 15:17:01.627 13994 13994 F DEBUG   :       #07 pc 00000000006e3af4  /vendor/lib64/libllvm-qgl.so (!!!0000!367303fb02553850da321d3446c78a!dc3d4da3a2!+100) (BuildId: 197773a235861a62fb29a17d08291e53)
02-12 15:17:01.627 13994 13994 F DEBUG   :       #08 pc 000000000081b5a4  /vendor/lib64/libllvm-qgl.so (!!!0000!0e406d1c583002d7aa7c873d54dca9!dc3d4da3a2!+372) (BuildId: 197773a235861a62fb29a17d08291e53)
02-12 15:17:01.627 13994 13994 F DEBUG   :       #09 pc 000000000081a314  /vendor/lib64/libllvm-qgl.so (!!!0000!115a3b096d9bc78c0dfb42d0e49024!dc3d4da3a2!+116) (BuildId: 197773a235861a62fb29a17d08291e53)
02-12 15:17:01.627 13994 13994 F DEBUG   :       #10 pc 0000000000819b54  /vendor/lib64/libllvm-qgl.so (!!!0000!aa916b5e953dd3dca1b992ddb2c964!dc3d4da3a2!+788) (BuildId: 197773a235861a62fb29a17d08291e53)
02-12 15:17:01.627 13994 13994 F DEBUG   :       #11 pc 0000000000d47d0c  /vendor/lib64/libllvm-qgl.so (!!!0000!51d38902a0381d361b611c909947d9!dc3d4da3a2!+60) (BuildId: 197773a235861a62fb29a17d08291e53)
02-12 15:17:01.627 13994 13994 F DEBUG   :       #12 pc 0000000000977bc0  /vendor/lib64/libllvm-qgl.so (!!!0000!34520e27c398aec80a9430978fab84!dc3d4da3a2!+1424) (BuildId: 197773a235861a62fb29a17d08291e53)
02-12 15:17:01.627 13994 13994 F DEBUG   :       #13 pc 0000000000972908  /vendor/lib64/libllvm-qgl.so (!!!0000!b351d96637f21e15c92b76750b44e2!dc3d4da3a2!+760) (BuildId: 197773a235861a62fb29a17d08291e53)
02-12 15:17:01.627 13994 13994 F DEBUG   :       #14 pc 0000000000971790  /vendor/lib64/libllvm-qgl.so (CreateQGLCProgram(QGPUCompiler::CompileData*)+48) (BuildId: 197773a235861a62fb29a17d08291e53)
02-12 15:17:01.627 13994 13994 F DEBUG   :       #15 pc 00000000009712d0  /vendor/lib64/libllvm-qgl.so (!!!0000!1e9735fa2d7fa7113c5ea09cbdfdc0!dc3d4da3a2!+320) (BuildId: 197773a235861a62fb29a17d08291e53)
02-12 15:17:01.627 13994 13994 F DEBUG   :       #16 pc 0000000000970040  /vendor/lib64/libllvm-qgl.so (!!!0000!aefccce6a332610a9b22f30d0961cc!dc3d4da3a2!+592) (BuildId: 197773a235861a62fb29a17d08291e53)
02-12 15:17:01.627 13994 13994 F DEBUG   :       #17 pc 000000000004cfc4  /vendor/lib64/libllvm-glnext.so (!!!0000!3dcaee58dbbfbd4511f8fc7a97b9b9!dc3d4da3a2!+900) (BuildId: 9e51ef917b23889becdc61e58b6448fc)
02-12 15:17:01.627 13994 13994 F DEBUG   :       #18 pc 000000000027dd0c  /vendor/lib64/hw/[vulkan.adreno.so](http://vulkan.adreno.so/) (!!!0000!9f8153b2695670b78964f3638e2666!dc3d4da3a2!+8076) (BuildId: 9ddb695a94bf97a272a018a299b56fb4)
02-12 15:17:01.627 13994 13994 F DEBUG   :       #19 pc 000000000027b3f4  /vendor/lib64/hw/[vulkan.adreno.so](http://vulkan.adreno.so/) (!!!0000!2aa5082753cd3c7ad1b8091f24093d!dc3d4da3a2!+340) (BuildId: 9ddb695a94bf97a272a018a299b56fb4)
02-12 15:17:01.627 13994 13994 F DEBUG   :       #20 pc 0000000000299b90  /vendor/lib64/hw/[vulkan.adreno.so](http://vulkan.adreno.so/) (!!!0000!4a8b3805ee4e9b1d8ce9b59e2f189a!dc3d4da3a2!+1072) (BuildId: 9ddb695a94bf97a272a018a299b56fb4)
02-12 15:17:01.627 13994 13994 F DEBUG   :       #21 pc 000000000029941c  /vendor/lib64/hw/[vulkan.adreno.so](http://vulkan.adreno.so/) (qglinternal::vkCreateComputePipelines(VkDevice_T*, VkPipelineCache_T*, unsigned int, VkComputePipelineCreateInfo const*, VkAllocationCallbacks const*, VkPipeline_T**)+684) (BuildId: 9ddb695a94bf97a272a018a299b56fb4)
02-12 15:17:01.627 13994 13994 F DEBUG   :       #22 pc 0000000001049bc8  /data/local/tmp/etvk/execute_bpte
02-12 15:17:01.627 13994 13994 F DEBUG   :       #23 pc 0000000000f2310c  /data/local/tmp/etvk/execute_bpte
02-12 15:17:01.627 13994 13994 F DEBUG   :       #24 pc 0000000000f099ec  /data/local/tmp/etvk/execute_bpte
02-12 15:17:01.628 13994 13994 F DEBUG   :       #25 pc 0000000000f08918  /data/local/tmp/etvk/execute_bpte
02-12 15:17:01.628 13994 13994 F DEBUG   :       #26 pc 0000000000697780  /data/local/tmp/etvk/execute_bpte

Environment

Android NDK: 29.0.13846066
Vulkan SDK: 1.4.321.0

GLSLC version:

shaderc v2023.8 v2025.3
spirv-tools v2025.3 v2022.4-833-g33e02568
glslang 11.1.0-1253-gefd24d75

Target: SPIR-V 1.0

Repro steps

Ensure that the Vulkan SDK is installed (latest version is OK) and that glslc exists on your path:

glslc --version

The Android NDK must also be installed. Any NDK version past NDK r17c should suffice. Set the ANDROID_NDK environment variable to the install location:

export ANDROID_NDK=...

Repository Setup

Setup ExecuTorch repo. I prepared the ssj_s23_segv_repro branch to make it easier to reproduce the issue.

git clone https://github.com/pytorch/executorch.git
cd executorch
git fetch
git checkout ssj_s23_segv_repro

git submodule update --init

Export Model

Build python libs and install executorch to your Python environment. Run from executorch root

./install_executorch.sh -e

Export the model to use for reproduction.

python ./cnn_toy.py
python ./cnn_toy.py --fp16

This should create two model files in the current dir:

$ ls | grep pte
cnn_toy_vulkan_fp16.pte
cnn_toy_vulkan_fp32.pte

Build libraries and model runner for Android

For this step, ensure ANDROID_NDK is set to the install path of the Android NDK.

Running from ExecuTorch Root:

cmake . \
    -DCMAKE_INSTALL_PREFIX=cmake-out-android-so \
    -DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \
    -DANDROID_SUPPORT_FLEXIBLE_PAGE_SIZES=ON \
    --preset "android-arm64-v8a" \
    -DANDROID_PLATFORM=android-28 \
    -DPYTHON_EXECUTABLE=python \
    -DCMAKE_BUILD_TYPE=Release \
    -DCMAKE_C_COMPILER_LAUNCHER=ccache \
    -DCMAKE_CXX_COMPILER_LAUNCHER=ccache \
    -DEXECUTORCH_PAL_DEFAULT=posix \
    -DEXECUTORCH_BUILD_VULKAN=ON \
    -DEXECUTORCH_BUILD_TESTS=OFF \
    -DEXECUTORCH_BUILD_EXTENSION_EVALUE_UTIL=ON \
    -DEXECUTORCH_BUILD_EXECUTOR_RUNNER=ON \
    -DEXECUTORCH_ENABLE_EVENT_TRACER=ON \
    -Bcmake-out-android-so && \
cmake --build cmake-out-android-so -j16 --target install --config Release

Then, push model files to device and attempt inference:

export MODEL_PATH=./cnn_toy_vulkan_fp16.pte && \
export MODEL_FILE=$(basename ${MODEL_PATH}) && \
adb shell mkdir -p /data/local/tmp/etvk/models/ && \
adb push $MODEL_PATH /data/local/tmp/etvk/models/$MODEL_FILE && \
adb push cmake-out-android-so/executor_runner /data/local/tmp/etvk && \
adb shell /data/local/tmp/etvk/executor_runner --model_path /data/local/tmp/etvk/models/$MODEL_FILE

When executing the model, the runtime will log the compute pipeline currently being created:

[ET-VK] Skipping pipeline for shader: concat_2_buffer_float
[ET-VK] Creating pipeline 1/34: buffer_to_nchw_float_float
[ET-VK] Creating pipeline 2/34: mean_per_row_buffer_float
[ET-VK] Creating pipeline 3/34: clone_image_to_buffer_float_float

When the seg fault occurs, the output will stop. The last printed shader name will indicate the compute pipeline that triggered the seg fault.

Inspecting GLSL/SPIR-V compute shaders

To inspect the GLSL/SPIR-V code of the shader:

$ cat cmake-out-android-so/vulkan_compute_shaders/clone_image_to_buffer_float_float.glsl
$ cat cmake-out-android-so/vulkan_compute_shaders/clone_image_to_buffer_float_float.spv

cc @manuelcandales @digantdesai @cbilgin

Metadata

Metadata

Assignees

No one assigned

    Labels

    module: vulkanIssues related to the Vulkan delegate and code under backends/vulkan/

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions