Making build.nvidia.com hosted inference easier by randerzander · Pull Request #1711 · NVIDIA/NeMo-Retriever

randerzander · 2026-03-24T19:39:22Z

Partially addresses #1669

Hopefully successfully signed commits this time

… by default for build

jperez999 · 2026-03-30T14:11:30Z

nemo_retriever/src/nemo_retriever/ocr/ocr.py


                if crop_b64s:
-                    response_items = invoke_image_inference_batches(
+                    response_items = invoke_nemotron_parse_batches(


are you trying to remove the use of non nemotron models? this seems to overwrite logic completely and no longer use other OCR models.

Claude:

The two functions are in entirely separate code paths: nemotron_parse_page_elements handles nemotron-parse, and the OCR function at line 567 handles nemotron-ocr. They were never sharing the call site.

jperez999 · 2026-03-30T14:13:03Z

nemo_retriever/src/nemo_retriever/ocr/ocr.py

    if isinstance(response_item, str):
        return response_item.strip()
    if isinstance(response_item, dict):
+        tool_calls = response_item.get("tool_calls")


What is this logic supposed to accomplish. Are we expecting to send tool calls to OCR actor?

see the example snippet for nemotron_parse inference on build

it requires tool calling and specifying the "markdown_bbox" tool

jperez999 · 2026-03-30T14:14:16Z

nemo_retriever/src/nemo_retriever/nim/nim.py

    return "image/png"


+def _normalize_chat_completions_response(response_json: Any) -> Any:


If this is openai specifc (API) why is this in the NIM file?

Claude's explanation:

1. nim.py — invoke_nemotron_parse_batches: New function that sends images to the chat completions endpoint using tool calling (markdown_bbox tool). The old code reused invoke_image_inference_batches which doesn't speak the chat completions / tool-call contract that build.nvidia.com requires. 2. ocr.py — nemotron_parse_page_elements: Switch the remote call sites from invoke_image_inference_batches → invoke_nemotron_parse_batches, and add nemotron_parse_model_name kwarg support so callers can specify the hosted model name. 3. ocr.py — _extract_parse_text: Handle the tool call response format — drill into tool_calls[0].function.arguments (a JSON string), parse it, and extract the "markdown" key from the result. 4. ocr.py — NemotronParseActor: Add nemotron_parse_model_name param so the model name flows through to the remote call. 5. nim.py — _normalize_chat_completions_response: Helper to unwrap choices[0].message from the chat completions envelope before passing to _extract_parse_text.

randerzander requested review from a team as code owners March 24, 2026 19:39

randerzander requested a review from charlesbluca March 24, 2026 19:39

randerzander added 5 commits March 26, 2026 09:16

Making build.nvidia.com hosted inference easier

b9a28d0

Adding preset for the caption task

7b25b0b

Adding preset URL for nemotron-parse and switching to the VL embedder…

b8aea06

… by default for build

Updating inference presets test

f1a5686

Updating inference presets formatting

ecdbbc0

randerzander force-pushed the build_inference_presets branch from 491ff70 to ecdbbc0 Compare March 26, 2026 13:22

Updating inference presets formatting

279fa00

jperez999 requested changes Mar 30, 2026

View reviewed changes

jperez999 approved these changes Mar 30, 2026

View reviewed changes

removed the unnecessary threadpool addition

016de83

randerzander closed this Mar 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Making build.nvidia.com hosted inference easier#1711

Making build.nvidia.com hosted inference easier#1711
randerzander wants to merge 7 commits intoNVIDIA:mainfrom
randerzander:build_inference_presets

randerzander commented Mar 24, 2026

Uh oh!

jperez999 Mar 30, 2026

Uh oh!

randerzander Mar 30, 2026

Uh oh!

jperez999 Mar 30, 2026

Uh oh!

randerzander Mar 30, 2026

Uh oh!

jperez999 Mar 30, 2026

Uh oh!

randerzander Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		return "image/png"


		def _normalize_chat_completions_response(response_json: Any) -> Any:

Conversation

randerzander commented Mar 24, 2026

Uh oh!

jperez999 Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

randerzander Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

jperez999 Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

randerzander Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

jperez999 Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

randerzander Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants