[Docs] Extend TopK spec with NaN handling info#33633
[Docs] Extend TopK spec with NaN handling info#33633Lagmator22 wants to merge 4 commits intoopenvinotoolkit:masterfrom
Conversation
78c65d3 to
6977aa3
Compare
|
@nshchego , could you please review? |
There was a problem hiding this comment.
Pull request overview
This PR fixes a bug in the CPU reference implementation of TopK where NaN values in the input data would incorrectly appear in the top-K results, blocking valid numbers from being selected.
Changes:
- Added explicit NaN checks in the comparison logic of
topk_ref_processto ensure NaN values are treated as smaller than any valid number - Reformatted the
dataFomatsvector initialization for improved readability
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| } | ||
|
|
||
| std::vector<std::pair<LayoutType, LayoutType>> dataFomats{{LayoutType::ncsp, LayoutType::ncsp}, | ||
| std::vector<std::pair<LayoutType, LayoutType>> dataFomats { |
There was a problem hiding this comment.
Corrected spelling of 'dataFomats' to 'dataFormats'.
| {LayoutType::nspc, LayoutType::nspc}, {LayoutType::nCsp16c, LayoutType::nCsp16c}, { | ||
| LayoutType::nCsp8c, LayoutType::nCsp8c | ||
| } |
There was a problem hiding this comment.
The reformatting of the dataFomats vector splits related pairs across lines inconsistently. Consider placing each layout pair on its own line for better readability, or keeping all pairs on the same line if they fit within line length limits.
| {LayoutType::nspc, LayoutType::nspc}, {LayoutType::nCsp16c, LayoutType::nCsp16c}, { | |
| LayoutType::nCsp8c, LayoutType::nCsp8c | |
| } | |
| {LayoutType::nspc, LayoutType::nspc}, | |
| {LayoutType::nCsp16c, LayoutType::nCsp16c}, | |
| {LayoutType::nCsp8c, LayoutType::nCsp8c} |
6977aa3 to
6bbc587
Compare
|
I believe, need to describe this case in documentation: https://github.com/openvinotoolkit/openvino/tree/master/docs/articles_en/documentation/openvino-ir-format/operation-sets/operation-specs/sort
|
|
Is there a plan to fix jit_uni_topk_kernel_f32 as well? |
|
Thanks @nshchego for the detailed review, I've tested and updated the documentation to explicitly clarify that openvino treats NaNs as smaller than valid numbers(consistent with NumPy behavior), which differs from PyTorch (where they are larger). Regarding the other points:
Ready for another look and thank u again! |
|
So, this case is undefined in product at all and we can't simply modify old operations. That may affect other customers. @mitruska, could you please take a look? Do we need a new TopK version for that? |
...rticles_en/documentation/openvino-ir-format/operation-sets/operation-specs/sort/top-k-11.rst
Outdated
Show resolved
Hide resolved
Adds std::isnan() check to the comparison loops in topk_ref_process to ensure NaN values are treated as smaller than any valid number, preventing them from blocking valid numbers from entering the top-K results. Fixes openvinotoolkit#33626
- Documented that NaN values are treated as smaller than valid numbers - Added comparison with NumPy (consistent) and PyTorch (differs) behavior - Clarified behavior in max mode with NaN-containing inputs
Extend the NaN handling note (already in TopK-11) to TopK-1 and TopK-3 operation specs for consistency. NaN values are treated as smaller than any valid number, matching NumPy behavior.
271cac7 to
f921135
Compare
|
Rebased onto latest master and extended the NaN handling documentation to TopK-1 and TopK-3 specs for consistency with TopK-11 (all 3 versions now describe the same NaN behavior). @mitruska - I understand this is pending your decision on whether NaN handling should be added to the existing TopK versions or introduced through a new operation version. Happy to go either route. If a new version is the way forward, I will try to help with that as well as a follow up. In the meantime, the code fix itself is minimal (2 lines in the reference implementation sort path) and all 120 existing smoke_TopK tests pass locally. The JIT kernel fix would be a separate follow-up as discussed with @nshchego. |
mitruska
left a comment
There was a problem hiding this comment.
Hi @Lagmator22, thank you for contributing to OpenVINO.
This PR can't be merged as a "fix" without common agreement (including all plugins) on the definition of "correct" behavior. IEEE‑754 defines NaN as something that fails ordered comparison. For any real number x (including ±inf) and any NaN, x < NaN, x > NaN, and x == NaN are all false, and even NaN == NaN is false. There’s no standardized and defined position for NaNs in a sort/TopK result, so implementations are free to keep NaNs in place, move NaNs to the end, or otherwise reorder NaNs. This explains the cross-framework/platform differences.
Given that frontend frameworks don’t explicitly document NaN ordering, my recommendation at this point is to keep OpenVINO TopK/ArgMax kernels using current comparisons (for perf) and document NaN ordering as implementation-defined or unspecified rather than adding NaN-specific kernel logic.
If deterministic behavior is needed for a model, it can be sanitized/masked before TopK, e.g. replace NaNs with -inf (push NaNs to bottom) or +inf (push NaNs to top).
Agree with @nshchego that such a change is a candidate for a new version of TopK, having a mode aligning NaN sort options with each frontend. Possible perf impact of additional NaN detection/handling should be measured for different platforms to avoid regressions, as all models using TopK can be affected (even if the NaN is not in the input, it's not possible to determine it at the conversion time for non-const inputs). Perf regression could be a blocker to use new version of TopK even if introduced.
If you are interested in follow-up work, a POC of a new TopK can be prepared to define constraints and benefits of having NaN behavior defined, while it needs to be supported with perf data and clear definition of expected behavior for different formats of models.
...rticles_en/documentation/openvino-ir-format/operation-sets/operation-specs/sort/top-k-11.rst
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
This code is cpu specific, the common reference implementation used for constant folding, testing plugins and results comparison between them is here:
openvino/src/core/reference/include/openvino/reference/topk.hpp
Lines 110 to 125 in 619647f
|
@mitruska Thanks for the detailed review, makes complete sense. I'll revert the code change and update the docs to note NaN ordering as implementation-defined behavior instead. If that works for you we can merge this PR as a docs-only change, and then discuss the new TopK version separately. I'd really like to take on the new TopK version work if you're open to guiding me through it. I have a few questions so I can get started on the right track: For the new op version, should the NaN handling mode be an attribute (something like nan_mode with options for numpy vs torch style), or is there a different way you'd want it structured? Where should the reference implementation live, should I be modifying src/core/reference/include/openvino/reference/topk.hpp that you pointed out, or would the new version get its own file? For perf measurements, what benchmarks would you want to see? I'm thinking comparing the NaN-aware path vs current on different input sizes and NaN densities across CPU at minimum. Any specific setup or tooling you'd recommend? Is there an existing TopK version bump (like the jump from TopK-3 to TopK-11) I should study as a reference for how new versions are added end to end? Should I open a separate issue to track the new version? Would appreciate any pointers you have, really want to get this right. Thank you |
…correctly" This reverts commit 1d1007d.
|
Hi @mitruska, I've reverted the code change and updated this PR to be docs-only. The three TopK specs (TopK-1, TopK-3, TopK-11) now document NaN ordering as implementation-defined, consistent with your recommendation and IEEE 754 semantics. I also included the -inf/+inf sanitization guidance you suggested for users who need deterministic behavior. The PR should be straightforward to merge now since it's purely documentation with no code or perf impact. I'm currently in mid-semester exams (through March 20), but I'm very interested in the follow-up work on a new TopK version with configurable NaN handling modes, happy to start scoping the POC once exams wrap up if you're open to guiding me through it. Also, I am participating in gsoc for the #2 project idea(I have a demo repo as well). Would appreciate it if you could take a look when you get a chance. Thanks for the guidance! |
mitruska
left a comment
There was a problem hiding this comment.
Hi @Lagmator22, extending TopK spec with the proposed note reflecting current NaN behavior LGTM.
As this PR no longer provides code changes, please update the PR description.
Any further work should be provided as a separate PR.
Regarding new version of TopK POC proposal:
The new op version should be added to the new opset (opset17 need to be initialized), as a new class v17::TopK (reusing TopK Base) and can expose an attribute (enum) to control NaN handling. Suggested attribute: a small enum such as nan_mode with values like nan_as_smallest / nan_as_largest / none (for backward compatibility). For the reference the existing function can be reused / modified (src/core/reference/include/openvino/reference/topk.hpp) as long as the changes are backward compatibile. By default it should preserve current OpenVINO behavior.
Here are example PRs with new operators work:
With new op proposal please provide motivation of the changes, perfectly with model examples that would benefit from extended TopK NaN handling.
Further review and decisions can be make based on that.
|
Hi @mitruska, updated the PR description to reflect this is docs-only. @nshchego @kblaszczak-intel could you take a look when you get a chance? Just NaN handling notes added to the TopK-1/3/11 specs, no code changes. |
|
build_jenkins |
Details:
Extends the TopK operation specs (TopK-1, TopK-3, TopK-11) with documentation on NaN handling behavior.
What this PR does:
This is a docs-only change with no code modifications. The code fix for NaN handling was reverted per reviewer feedback -- a new TopK version with configurable NaN handling (nan_mode attribute) is planned as a follow-up.
Tickets: