Skip to content

Conversation

@ShawnXuan
Copy link
Collaborator

This PR is mainly to support the NPU-compatible SortedNMS operator, so sorted_score and input_indices have been added.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Adds support for an NPU-compatible SortedNMS operator by extending both the Python-level API and underlying C++ implementations to accept sorted scores and input indices.

  • Python wrapper (nms_op) now branches for NPU to gather and pass sorted_scores and score_inds.
  • Core op registration and functor logic updated to include optional scores and input_indices, with new fused operator for NPU.
  • Data type inference and API signature modified for NPU support in nms_op.cpp and functional_api.yaml.

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
python/oneflow/nn/modules/nms.py Branch NPU path to gather sorted scores and pass indices
oneflow/user/ops/nms_op.cpp InferNmsDataType now sets different dtype for NPU
oneflow/ir/include/OneFlow/OneFlowUserOps.td Added optional scores and input_indices inputs
oneflow/core/functional/impl/nn_functor.cpp Built fused_op_ and dispatch based on device and inputs
oneflow/core/functional/impl/array_functor.cpp Route argwhere through CPU for NPU
oneflow/core/functional/functional_api.yaml Updated nms signature with new optional tensor args
Comments suppressed due to low confidence (1)

oneflow/user/ops/nms_op.cpp:34

  • The CPU path sets output dtype to Int8, but NMS usually returns index tensors; Int32 or Int64 may be more appropriate.
ctx->SetOutputDType("out", 0, DataType::kInt8);

return OpInterpUtil::Dispatch<Tensor>(*op_, {x}, attrs);
DeviceType device_type = JUST(x->device())->enum_type();
if (device_type == DeviceType::kNPU) {
if (scores) {
Copy link

Copilot AI Jun 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fused NMS path checks only scores; it should verify both scores and input_indices are provided to avoid passing a null optional downstream.

Suggested change
if (scores) {
if (scores && input_indices) {

Copilot uses AI. Check for mistakes.
@github-actions
Copy link
Contributor

github-actions bot commented Jun 1, 2025

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.


- name: "nms"
signature: "Tensor (Tensor x, Float iou_threshold, Int32 keep_n=-1) => Nms"
signature: "Tensor (Tensor x, Tensor scores=None, Tensor input_indices=None, Float iou_threshold, Int32 keep_n=-1) => Nms"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

看看是否需要/方便,为npu的nms导出独立的api/functor?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants