Skip to content

[Docs] Extend TopK spec with NaN handling info#33633

Open
Lagmator22 wants to merge 4 commits intoopenvinotoolkit:masterfrom
Lagmator22:fix/cpu-topk-nan-handling
Open

[Docs] Extend TopK spec with NaN handling info#33633
Lagmator22 wants to merge 4 commits intoopenvinotoolkit:masterfrom
Lagmator22:fix/cpu-topk-nan-handling

Conversation

@Lagmator22
Copy link

@Lagmator22 Lagmator22 commented Jan 15, 2026

Details:

Extends the TopK operation specs (TopK-1, TopK-3, TopK-11) with documentation on NaN handling behavior.

What this PR does:

  • Documents that NaN ordering in TopK results is implementation-defined, consistent with IEEE 754 semantics
  • Notes that different frameworks (NumPy, PyTorch) handle NaN ordering differently
  • Provides guidance for users who need deterministic behavior: sanitize NaN values before TopK (e.g. replace with -inf or +inf)

This is a docs-only change with no code modifications. The code fix for NaN handling was reverted per reviewer feedback -- a new TopK version with configurable NaN handling (nan_mode attribute) is planned as a follow-up.

Tickets:

@Lagmator22 Lagmator22 requested review from a team as code owners January 15, 2026 20:53
@github-actions github-actions bot added the category: CPU OpenVINO CPU plugin label Jan 15, 2026
@sys-openvino-ci sys-openvino-ci added the ExternalPR External contributor label Jan 15, 2026
@Lagmator22 Lagmator22 force-pushed the fix/cpu-topk-nan-handling branch from 78c65d3 to 6977aa3 Compare January 17, 2026 20:30
@maxnick maxnick requested a review from nshchego January 20, 2026 08:04
@maxnick maxnick added the pr: needs tests PR needs tests updating label Jan 20, 2026
@maxnick
Copy link
Contributor

maxnick commented Jan 20, 2026

@nshchego , could you please review?

@maxnick maxnick requested a review from Copilot January 20, 2026 08:05
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a bug in the CPU reference implementation of TopK where NaN values in the input data would incorrectly appear in the top-K results, blocking valid numbers from being selected.

Changes:

  • Added explicit NaN checks in the comparison logic of topk_ref_process to ensure NaN values are treated as smaller than any valid number
  • Reformatted the dataFomats vector initialization for improved readability

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

}

std::vector<std::pair<LayoutType, LayoutType>> dataFomats{{LayoutType::ncsp, LayoutType::ncsp},
std::vector<std::pair<LayoutType, LayoutType>> dataFomats {
Copy link

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling of 'dataFomats' to 'dataFormats'.

Copilot uses AI. Check for mistakes.
Comment on lines +2018 to +2020
{LayoutType::nspc, LayoutType::nspc}, {LayoutType::nCsp16c, LayoutType::nCsp16c}, {
LayoutType::nCsp8c, LayoutType::nCsp8c
}
Copy link

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reformatting of the dataFomats vector splits related pairs across lines inconsistently. Consider placing each layout pair on its own line for better readability, or keeping all pairs on the same line if they fit within line length limits.

Suggested change
{LayoutType::nspc, LayoutType::nspc}, {LayoutType::nCsp16c, LayoutType::nCsp16c}, {
LayoutType::nCsp8c, LayoutType::nCsp8c
}
{LayoutType::nspc, LayoutType::nspc},
{LayoutType::nCsp16c, LayoutType::nCsp16c},
{LayoutType::nCsp8c, LayoutType::nCsp8c}

Copilot uses AI. Check for mistakes.
@Lagmator22 Lagmator22 force-pushed the fix/cpu-topk-nan-handling branch from 6977aa3 to 6bbc587 Compare January 20, 2026 18:49
@Lagmator22
Copy link
Author

Hi @nshchego @maxnick , Ive rebased and cleaned up the PR by removin an accidental formatting change that was unrelated to the fix so now it only contains the nan handling fix (2 lines changed), ready for review.

@Lagmator22
Copy link
Author

Hi @maxnick, any update on this? @nshchego hasn't responded yet. would be happy to add tests if needed.

@nshchego
Copy link
Contributor

Is there a plan to fix jit_uni_topk_kernel_f32 as well?

@Lagmator22 Lagmator22 requested a review from a team as a code owner January 30, 2026 21:19
@Lagmator22 Lagmator22 requested review from kblaszczak-intel and removed request for a team January 30, 2026 21:19
@github-actions github-actions bot added the category: docs OpenVINO documentation label Jan 30, 2026
@Lagmator22
Copy link
Author

Thanks @nshchego for the detailed review, I've tested and updated the documentation to explicitly clarify that openvino treats NaNs as smaller than valid numbers(consistent with NumPy behavior), which differs from PyTorch (where they are larger).

Regarding the other points:

  • Test Cases: I tried injecting NaNs into the existing test inputs, but this causes failures because the test framework compares CPU output against the reference implementation. Since Nan != NaN and the relative order of NaNs is undefined, direct comparison fails even when the values are technically correct. For now, I've relied on the existing tests(which pass) since the reference implementation fix itself is verified.

  • JIT Kernel: as expected I analyzed the x64 jit kernel logic and confirmed it uses inconsistent comparison flags (_cmp_lt_os vs _cmp_nle_us) which would handle NaNs differently. Since I'm developing on ARM (Mac) where the JIT path isn't used, I can't verify a jit fix locally. I'd propose addressing the JIT fix in a follow-up PR where we can leverage CI for x64 validation.

Ready for another look and thank u again!

@nshchego
Copy link
Contributor

So, this case is undefined in product at all and we can't simply modify old operations. That may affect other customers.
In my opinion, we should introduce a new operation version. Define this case there and add an attribute to differentiate behavior if there are different approaches in the supported frameworks.

@mitruska, could you please take a look? Do we need a new TopK version for that?

Adds std::isnan() check to the comparison loops in topk_ref_process to ensure
NaN values are treated as smaller than any valid number, preventing them from
blocking valid numbers from entering the top-K results.

Fixes openvinotoolkit#33626
- Documented that NaN values are treated as smaller than valid numbers
- Added comparison with NumPy (consistent) and PyTorch (differs) behavior
- Clarified behavior in max mode with NaN-containing inputs
Extend the NaN handling note (already in TopK-11) to TopK-1 and TopK-3
operation specs for consistency. NaN values are treated as smaller than
any valid number, matching NumPy behavior.
@Lagmator22 Lagmator22 force-pushed the fix/cpu-topk-nan-handling branch from 271cac7 to f921135 Compare February 19, 2026 22:03
@Lagmator22
Copy link
Author

Rebased onto latest master and extended the NaN handling documentation to TopK-1 and TopK-3 specs for consistency with TopK-11 (all 3 versions now describe the same NaN behavior).

@mitruska - I understand this is pending your decision on whether NaN handling should be added to the existing TopK versions or introduced through a new operation version. Happy to go either route. If a new version is the way forward, I will try to help with that as well as a follow up.

In the meantime, the code fix itself is minimal (2 lines in the reference implementation sort path) and all 120 existing smoke_TopK tests pass locally. The JIT kernel fix would be a separate follow-up as discussed with @nshchego.

Copy link
Contributor

@mitruska mitruska left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Lagmator22, thank you for contributing to OpenVINO.

This PR can't be merged as a "fix" without common agreement (including all plugins) on the definition of "correct" behavior. IEEE‑754 defines NaN as something that fails ordered comparison. For any real number x (including ±inf) and any NaN, x < NaN, x > NaN, and x == NaN are all false, and even NaN == NaN is false. There’s no standardized and defined position for NaNs in a sort/TopK result, so implementations are free to keep NaNs in place, move NaNs to the end, or otherwise reorder NaNs. This explains the cross-framework/platform differences.

Given that frontend frameworks don’t explicitly document NaN ordering, my recommendation at this point is to keep OpenVINO TopK/ArgMax kernels using current comparisons (for perf) and document NaN ordering as implementation-defined or unspecified rather than adding NaN-specific kernel logic.

If deterministic behavior is needed for a model, it can be sanitized/masked before TopK, e.g. replace NaNs with -inf (push NaNs to bottom) or +inf (push NaNs to top).

Agree with @nshchego that such a change is a candidate for a new version of TopK, having a mode aligning NaN sort options with each frontend. Possible perf impact of additional NaN detection/handling should be measured for different platforms to avoid regressions, as all models using TopK can be affected (even if the NaN is not in the input, it's not possible to determine it at the conversion time for non-const inputs). Perf regression could be a blocker to use new version of TopK even if introduced.

If you are interested in follow-up work, a POC of a new TopK can be prepared to define constraints and benefits of having NaN behavior defined, while it needs to be supported with perf data and clear definition of expected behavior for different formats of models.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is cpu specific, the common reference implementation used for constant folding, testing plugins and results comparison between them is here:

* @brief Reference implementation for TopK operator
*
* @param arg Pointer to input data.
* @param out_indices Pointer to output indicies.
* @param out_values Pointer to output values.
* @param in_shape Input data shape.
* @param out_shape Output data (values, indicies) shape.
* @param axis Axis for search of top K elements.
* @param k Number to find of top elements.
* @param compute_max Select mode of find max or min.
* @param sort Sorting type.
*/
template <typename T,
typename U,
typename std::enable_if<!std::is_same<typename std::decay<U>::type, int64_t>::value>::type* = nullptr>
void topk(const T* arg,

@Lagmator22
Copy link
Author

@mitruska Thanks for the detailed review, makes complete sense. I'll revert the code change and update the docs to note NaN ordering as implementation-defined behavior instead. If that works for you we can merge this PR as a docs-only change, and then discuss the new TopK version separately.

I'd really like to take on the new TopK version work if you're open to guiding me through it. I have a few questions so I can get started on the right track:

For the new op version, should the NaN handling mode be an attribute (something like nan_mode with options for numpy vs torch style), or is there a different way you'd want it structured?

Where should the reference implementation live, should I be modifying src/core/reference/include/openvino/reference/topk.hpp that you pointed out, or would the new version get its own file?

For perf measurements, what benchmarks would you want to see? I'm thinking comparing the NaN-aware path vs current on different input sizes and NaN densities across CPU at minimum. Any specific setup or tooling you'd recommend?

Is there an existing TopK version bump (like the jump from TopK-3 to TopK-11) I should study as a reference for how new versions are added end to end?

Should I open a separate issue to track the new version?

Would appreciate any pointers you have, really want to get this right.

Thank you

@github-actions github-actions bot removed the category: CPU OpenVINO CPU plugin label Mar 11, 2026
@Lagmator22
Copy link
Author

Hi @mitruska,

I've reverted the code change and updated this PR to be docs-only. The three TopK specs (TopK-1, TopK-3, TopK-11) now document NaN ordering as implementation-defined, consistent with your recommendation and IEEE 754 semantics. I also included the -inf/+inf sanitization guidance you suggested for users who need deterministic behavior.

The PR should be straightforward to merge now since it's purely documentation with no code or perf impact.

I'm currently in mid-semester exams (through March 20), but I'm very interested in the follow-up work on a new TopK version with configurable NaN handling modes, happy to start scoping the POC once exams wrap up if you're open to guiding me through it. Also, I am participating in gsoc for the #2 project idea(I have a demo repo as well).

Would appreciate it if you could take a look when you get a chance. Thanks for the guidance!

@Lagmator22 Lagmator22 requested a review from mitruska March 12, 2026 07:37
@mitruska mitruska changed the title [CPU] Fix TopK reference implementation to handle NaN inputs correctly [Docs] Extend TopK spec with NaN handling info Mar 13, 2026
Copy link
Contributor

@mitruska mitruska left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Lagmator22, extending TopK spec with the proposed note reflecting current NaN behavior LGTM.

As this PR no longer provides code changes, please update the PR description.
Any further work should be provided as a separate PR.


Regarding new version of TopK POC proposal:

The new op version should be added to the new opset (opset17 need to be initialized), as a new class v17::TopK (reusing TopK Base) and can expose an attribute (enum) to control NaN handling. Suggested attribute: a small enum such as nan_mode with values like nan_as_smallest / nan_as_largest / none (for backward compatibility). For the reference the existing function can be reused / modified (src/core/reference/include/openvino/reference/topk.hpp) as long as the changes are backward compatibile. By default it should preserve current OpenVINO behavior.

Here are example PRs with new operators work:

With new op proposal please provide motivation of the changes, perfectly with model examples that would benefit from extended TopK NaN handling.

Further review and decisions can be make based on that.

@Lagmator22
Copy link
Author

Hi @mitruska, updated the PR description to reflect this is docs-only. @nshchego @kblaszczak-intel could you take a look when you get a chance? Just NaN handling notes added to the TopK-1/3/11 specs, no code changes.

@praasz praasz removed the pr: needs tests PR needs tests updating label Mar 17, 2026
@praasz praasz added this to the 2026.2 milestone Mar 17, 2026
@praasz
Copy link
Contributor

praasz commented Mar 17, 2026

build_jenkins

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: docs OpenVINO documentation ExternalPR External contributor

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants