Skip to content

Conversation

@dcampora
Copy link
Contributor

@dcampora dcampora commented Oct 27, 2025

Purpose

This PR allows top_k_per_row work for any values in logits. Even if the logits differ only in the least significant bytes, top-k is now guaranteed to always give a correct answer.

Solves #26554

Test Plan

The test test_top_k_per_row has been amplified to cover these cases too. They didn't pass before and now they pass.

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@mergify mergify bot added the deepseek Related to DeepSeek models label Oct 27, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the top-k kernel to correctly handle logit values that differ only in their least significant bits, by using a multi-pass histogram approach on the full 32-bit float representation. The changes are extensive and introduce a more complex but precise algorithm.

My review has identified several critical issues and areas for improvement:

  • There are critical bugs related to memory access and incorrect output indices when rowStart is not zero. The tests should be expanded to cover this case.
  • The logic for selecting the sorting algorithm in the host-side launcher functions appears to be flawed, potentially leading to significant performance degradation.
  • A fallback sorting algorithm within the kernel is misleadingly commented and has quadratic complexity, which can be a performance bottleneck.
  • There is some code duplication that should be addressed to improve maintainability.

I have provided specific comments with code suggestions to address these points. Overall, the direction is good, but these issues need to be fixed before merging.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Signed-off-by: Daniel Campora <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

deepseek Related to DeepSeek models

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant