[GPU] Add bf16 data type support for models traced in bfloat16 by jpatrickiles-dev · Pull Request #34714 · openvinotoolkit/openvino

jpatrickiles-dev · 2026-03-16T02:19:59Z

Title: [GPU] Add bf16 data type support for Slice, VariadicSplit and other ops

Details:

Intel Arc Xe-LPG and other GPU devices that export models in bfloat16 (e.g. Qwen3.5 GatedDeltaNet hybrid models) fail at runtime with:
No layout format available for variadicsplit/slice, impl_type: any (format: bfyx, data_type: bf16)
The GPU plugin's kernel selectors and OCL impl registrations lacked bf16 in their supported data types for multiple op types.
Root cause: 12 kernel selectors and 11 OCL impl registrations were missing Datatype::BF16 / data_types::bf16. Additionally, the ConvertPrecision(bf16→f16) pass ran too early in the transformations pipeline, leaving residual bf16 tensors that had no layout implementation.

Architecture context:

Qwen3.5 models use a hybrid GatedDeltaNet + full attention architecture where the GatedDeltaNet layers are traced in bfloat16. This is the first widely-used model family to expose this gap in the GPU plugin's bf16 op coverage. Previously, most models either used f16/f32 throughout or only used bf16 in matmul ops (which were already supported via XMX). The GatedDeltaNet's projection splitting and conv state operations introduce bf16 Slice, VariadicSplit, and related ops that had no registered layout format.

Fix:

Added Datatype::BF16 to kernel selectors for: slice, strided_slice, eltwise, activation, concatenation, reduce, gather, select, convolution, gemm
Added data_types::bf16 to OCL impl registrations for: slice, strided_slice, crop, eltwise, activation, concatenation, reduce, gather, select, gemm, convolution
Added a final bf16→f16 ConvertPrecision cleanup pass at the end of transformations_pipeline.cpp to catch any bf16 that survives earlier passes

Tickets:

N/A

Tested on:

Intel Arc Xe-LPG (Meteor Lake), kernel 6.19.4, Ubuntu 24.04, OpenVINO 2026.1. Validated with Qwen3.5-0.8B, 4B, and 9B INT4 models on GPU. No regression on existing Qwen3-8B INT4 workloads.

AI Assistance:

yes — Claude used for root cause analysis and fix development. Human validation: rebuilt GPU plugin locally, verified Qwen3.5 0.8B/4B/9B all produce coherent output on Intel Arc Xe-LPG.

Models traced in bf16 (e.g., Qwen3.5 from transformers 5.x) fail on GPU devices that support f16 but not bf16 (Intel Arc Xe-LPG / Meteor Lake). The GPU plugin's ConvertPrecision pass maps bf16→f16 but KeepConstantsPrecisionAndAddConverts preserves bf16 constants feeding MatMul, causing bf16 to persist in the compiled graph. Two-part fix: 1. Add bf16 (Datatype::BF16 / data_types::bf16) to kernel selectors and OCL impl registrations for: slice, strided_slice, crop (variadic_split), eltwise (multiply/add/divide), activation (swish/sigmoid/sqrt), concatenation, reduce, gather, select, convolution, gemm. This enables the GPU to handle any bf16 tensors that survive precision conversion passes. 2. Add a final bf16→f16 ConvertPrecision cleanup pass at the end of the GPU transformation pipeline. This catches bf16 that survives earlier passes due to KeepConstantsPrecision and store_original_precision interactions, converting it to f16 which the GPU natively supports. The cleanup pass uses convert_input_output_precision=true to ensure complete bf16 elimination. Tested: Qwen3.5-0.8B INT4 model now runs correctly on Intel Arc Xe-LPG (Meteor Lake iGPU) producing coherent text output. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

jpatrickiles-dev requested review from a team as code owners March 16, 2026 02:20

github-actions bot added the category: GPU OpenVINO GPU plugin label Mar 16, 2026

sys-openvino-ci added the ExternalPR External contributor label Mar 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GPU] Add bf16 data type support for models traced in bfloat16#34714

[GPU] Add bf16 data type support for models traced in bfloat16#34714
jpatrickiles-dev wants to merge 1 commit intoopenvinotoolkit:masterfrom
jpatrickiles-dev:fix/gpu-bf16-op-support

jpatrickiles-dev commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jpatrickiles-dev commented Mar 16, 2026

Title: [GPU] Add bf16 data type support for Slice, VariadicSplit and other ops

Details:

Architecture context:

Fix:

Tickets:

Tested on:

AI Assistance:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants