Skip to content

Making build.nvidia.com hosted inference easier#1711

Closed
randerzander wants to merge 7 commits intoNVIDIA:mainfrom
randerzander:build_inference_presets
Closed

Making build.nvidia.com hosted inference easier#1711
randerzander wants to merge 7 commits intoNVIDIA:mainfrom
randerzander:build_inference_presets

Conversation

@randerzander
Copy link
Copy Markdown
Collaborator

Partially addresses #1669

Hopefully successfully signed commits this time

@randerzander randerzander requested review from a team as code owners March 24, 2026 19:39
@randerzander randerzander force-pushed the build_inference_presets branch from 491ff70 to ecdbbc0 Compare March 26, 2026 13:22

if crop_b64s:
response_items = invoke_image_inference_batches(
response_items = invoke_nemotron_parse_batches(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you trying to remove the use of non nemotron models? this seems to overwrite logic completely and no longer use other OCR models.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude:

The two functions are in entirely separate code paths: nemotron_parse_page_elements handles nemotron-parse, and the OCR function at line 567 handles nemotron-ocr. They were never sharing the call site.

if isinstance(response_item, str):
return response_item.strip()
if isinstance(response_item, dict):
tool_calls = response_item.get("tool_calls")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this logic supposed to accomplish. Are we expecting to send tool calls to OCR actor?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see the example snippet for nemotron_parse inference on build

it requires tool calling and specifying the "markdown_bbox" tool

return "image/png"


def _normalize_chat_completions_response(response_json: Any) -> Any:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is openai specifc (API) why is this in the NIM file?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude's explanation:

1. nim.py — invoke_nemotron_parse_batches: New function that sends images to the chat completions endpoint using tool calling (markdown_bbox tool). The old code reused invoke_image_inference_batches which doesn't speak the chat completions / tool-call contract that build.nvidia.com requires.                                                                                                                          

  2. ocr.py — nemotron_parse_page_elements: Switch the remote call sites from invoke_image_inference_batches → invoke_nemotron_parse_batches, and add nemotron_parse_model_name kwarg support so callers can specify the hosted model name.                                                                                                                                                                                   

  3. ocr.py — _extract_parse_text: Handle the tool call response format — drill into tool_calls[0].function.arguments (a JSON string), parse it, and extract the "markdown" key from the result.                   

  4. ocr.py — NemotronParseActor: Add nemotron_parse_model_name param so the model name flows through to the remote call.                                                                                          

  5. nim.py — _normalize_chat_completions_response: Helper to unwrap choices[0].message from the chat completions envelope before passing to _extract_parse_text. 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants