[GPU] Fix select-kernel indexing mismatch for lower-rank inputs#34537
Open
davidsnam-intel wants to merge 1 commit intoopenvinotoolkit:masterfrom
Open
[GPU] Fix select-kernel indexing mismatch for lower-rank inputs#34537davidsnam-intel wants to merge 1 commit intoopenvinotoolkit:masterfrom
davidsnam-intel wants to merge 1 commit intoopenvinotoolkit:masterfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Issue
Building the
Selectkernel can be failed when NUMPY broadcasting is used with lower-rank inputs, particularly in the dynamic / new shape inference path.Under NUMPY broadcasting semantics this should behave as:
However, some inputs may remain lower-rank during kernel JIT generation. This can lead to mismatched macro signatures in the generated OpenCL kernel, resulting in build failures such as:
Root cause
In the dynamic/new-shape-infer path,
Selectinput layouts are not fully canonicalized to the output rank before kernel JIT generation.As a result:
Selectkernel may assume 5D indexingINPUT*_GET_INDEX_SAFEmacros correspond to lower-rank inputsThis mismatch leads to macro argument inconsistencies during OpenCL kernel compilation.
Solution
This PR fixes the issue in two steps:
Canonicalize Select input layouts
Selectimpl, input shapes are extended to a consistent rank based on the output rank:target_rank = max(4, output_rank)extend_shape_to_rank_from_begin()and layouts are adjusted withformat::adjust_to_rank().Simplify
Selectkernel indexingSelectkernel always uses consistent 5D indexing whenOUTPUT_DIMS == 5.GET_INDEX_SAFEmacros follow the same rank contract.Graphs
Tickets