Making build.nvidia.com hosted inference easier#1711
Making build.nvidia.com hosted inference easier#1711randerzander wants to merge 7 commits intoNVIDIA:mainfrom
Conversation
491ff70 to
ecdbbc0
Compare
|
|
||
| if crop_b64s: | ||
| response_items = invoke_image_inference_batches( | ||
| response_items = invoke_nemotron_parse_batches( |
There was a problem hiding this comment.
are you trying to remove the use of non nemotron models? this seems to overwrite logic completely and no longer use other OCR models.
There was a problem hiding this comment.
Claude:
The two functions are in entirely separate code paths: nemotron_parse_page_elements handles nemotron-parse, and the OCR function at line 567 handles nemotron-ocr. They were never sharing the call site.
| if isinstance(response_item, str): | ||
| return response_item.strip() | ||
| if isinstance(response_item, dict): | ||
| tool_calls = response_item.get("tool_calls") |
There was a problem hiding this comment.
What is this logic supposed to accomplish. Are we expecting to send tool calls to OCR actor?
There was a problem hiding this comment.
see the example snippet for nemotron_parse inference on build
it requires tool calling and specifying the "markdown_bbox" tool
| return "image/png" | ||
|
|
||
|
|
||
| def _normalize_chat_completions_response(response_json: Any) -> Any: |
There was a problem hiding this comment.
If this is openai specifc (API) why is this in the NIM file?
There was a problem hiding this comment.
Claude's explanation:
1. nim.py — invoke_nemotron_parse_batches: New function that sends images to the chat completions endpoint using tool calling (markdown_bbox tool). The old code reused invoke_image_inference_batches which doesn't speak the chat completions / tool-call contract that build.nvidia.com requires.
2. ocr.py — nemotron_parse_page_elements: Switch the remote call sites from invoke_image_inference_batches → invoke_nemotron_parse_batches, and add nemotron_parse_model_name kwarg support so callers can specify the hosted model name.
3. ocr.py — _extract_parse_text: Handle the tool call response format — drill into tool_calls[0].function.arguments (a JSON string), parse it, and extract the "markdown" key from the result.
4. ocr.py — NemotronParseActor: Add nemotron_parse_model_name param so the model name flows through to the remote call.
5. nim.py — _normalize_chat_completions_response: Helper to unwrap choices[0].message from the chat completions envelope before passing to _extract_parse_text.
Partially addresses #1669
Hopefully successfully signed commits this time