fix(nvidia_nim): extract <think> reasoning blocks from content in transform_response by NIK-TIGER-BILL · Pull Request #24276 · BerriAI/litellm

NIK-TIGER-BILL · 2026-03-21T07:21:38Z

Summary

Problem

NVIDIA NIM reasoning models (e.g. minimax/minimax-m1) return their chain-of-thought inside a raw <think>…</think> block in choices[0].message.content. The expected behaviour is for litellm to split this into reasoning_content + content, as it does for DeepSeek R1 and other reasoning providers. Instead, reasoning_content is null and the raw tags appear in content.

Root cause

The litellm response pipeline already calls _parse_content_for_reasoning() via _extract_reasoning_content() in convert_to_model_response_object(). However, in the proxy path with a thinking: true / merge_reasoning_content_in_choices: false model config, NvidiaNimConfig inherits OpenAIGPTConfig.transform_response() which routes through convert_to_model_response_object() — but that function only invokes _extract_reasoning_content() when the choice message dictionary has a string content. A subtle ordering issue with how the proxy wraps the response means the extraction step can be skipped.

Fix

Override transform_response() in NvidiaNimConfig to add an explicit post-processing pass after the parent class builds the ModelResponse: if reasoning_content is still None and the message content contains <think>…</think>, extract the reasoning using the shared _parse_content_for_reasoning() helper.

After fix

{
  "choices": [{
    "message": {
      "role": "assistant",
      "content": "\n\npong",
      "reasoning_content": "The user just sent ping..."
    }
  }]
}

Fixes BerriAI#24234 The embedding transformer already strips the 'openrouter/' prefix before sending the model name to the API (see litellm/llms/openrouter/embedding/ transformation.py:112-113). The chat transformer was missing the same guard: when litellm receives a model string like 'openrouter/mistralai/mistral-7b-instruct' it internally preserves the 'openrouter/' prefix (get_llm_provider_logic.py returns early at L166) so the full string was being forwarded to the OpenRouter API, which does not recognise it and returns a 404/invalid-model error. Fix: strip the 'openrouter/' prefix at the top of OpenrouterConfig.transform_request() — consistent with the approach already used in the embedding transformer.

…nsform_response Fixes BerriAI#24253 NVIDIA NIM reasoning models (e.g. minimax/minimax-m1) return their chain-of-thought inside a raw <think>…</think> block embedded in choices[0].message.content. The litellm response pipeline already calls _parse_content_for_reasoning() via _extract_reasoning_content() in convert_to_model_response_object(), but a regression in the proxy path can prevent that from running for nvidia_nim before the response is surfaced to the caller — leaving reasoning_content=null and the raw <think> tags in content. Fix: override transform_response() in NvidiaNimConfig to explicitly scan the content after the parent class has finished building the ModelResponse. If reasoning_content is still None and content contains a <think>…</think> block, extract it using the shared _parse_content_for_reasoning() helper. Setting thinking: true / merge_reasoning_content_in_choices: false in the proxy model config will now behave consistently with other reasoning providers (DeepSeek R1 via OpenRouter, etc.).

vercel · 2026-03-21T07:21:45Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
litellm	Ready	Preview, Comment	Mar 21, 2026 7:23am

CLAassistant · 2026-03-21T07:21:50Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

NIK-TIGER-BILL seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

codspeed-hq · 2026-03-21T07:24:02Z

Merging this PR will not alter performance

✅ 16 untouched benchmarks

_{Comparing NIK-TIGER-BILL:fix/nvidia-nim-parse-think-tags (79a5629) with main (d8e4fc4)}

greptile-apps · 2026-03-21T07:24:33Z

Greptile Summary

This PR makes two independent fixes: (1) adds a transform_response override on NvidiaNimConfig that post-processes the parent-built ModelResponse to extract <think>…</think> reasoning blocks from content into reasoning_content, and (2) strips the openrouter/ provider-prefix from the model name in OpenrouterConfig.transform_request before forwarding it to the OpenRouter API.

Key changes:

NvidiaNimConfig.transform_response: calls super().transform_response() first, then iterates over choices — if reasoning_content is None and the content string contains a <think> block, it calls the shared _parse_content_for_reasoning() helper to split the two fields. The approach reuses existing infra and doesn't hard-code model names.
OpenrouterConfig.transform_request: adds a model.startswith("openrouter/") guard to strip the litellm-internal prefix so the raw model ID reaches the API. The async path is safe because OpenAIGPTConfig.async_transform_request delegates to self.transform_request for subclasses.
An unused cast import was added to nvidia_nim/chat/transformation.py — should be removed.
No unit test was added for the new reasoning-extraction logic in NvidiaNimConfig, which is required by the project's review standards to demonstrate that the reported issue is resolved.

Confidence Score: 3/5

PR is safe to merge for existing non-reasoning NIM models, but lacks automated test coverage for the new reasoning-extraction path.
The core logic is straightforward and correctly reuses an existing helper (_parse_content_for_reasoning). The guard (reasoning_content is not None → skip) prevents double-processing. However, no unit test was added for the new code path, leaving the fix unverified by CI. The unused cast import is trivial. Confidence is lowered mainly because of missing test coverage for the stated fix.
litellm/llms/nvidia_nim/chat/transformation.py — needs a unit test for the reasoning extraction path

Important Files Changed

Filename	Overview
litellm/llms/nvidia_nim/chat/transformation.py	Adds a `transform_response` override that post-processes the parent's ModelResponse to extract `<think>…</think>` blocks into `reasoning_content`. Logic is correct but an unused `cast` import remains, and no unit tests were added to verify the fix.
litellm/llms/openrouter/chat/transformation.py	Adds `openrouter/` prefix-stripping in `transform_request` before the model name is sent to the API. The async path is also covered because `OpenAIGPTConfig.async_transform_request` delegates to `self.transform_request` for non-base-class instances. Change looks correct.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant NvidiaNimConfig
    participant OpenAIGPTConfig
    participant _parse_content_for_reasoning

    Caller->>NvidiaNimConfig: transform_response(raw_response, ...)
    NvidiaNimConfig->>OpenAIGPTConfig: super().transform_response(...)
    OpenAIGPTConfig-->>NvidiaNimConfig: ModelResponse (reasoning_content=None, content="<think>...</think> answer")
    loop Each choice in result
        NvidiaNimConfig->>NvidiaNimConfig: check reasoning_content is None
        NvidiaNimConfig->>NvidiaNimConfig: check content is str
        NvidiaNimConfig->>_parse_content_for_reasoning: _parse_content_for_reasoning(content)
        _parse_content_for_reasoning-->>NvidiaNimConfig: (reasoning, stripped_content)
        NvidiaNimConfig->>NvidiaNimConfig: message.reasoning_content = reasoning
        NvidiaNimConfig->>NvidiaNimConfig: message.content = stripped_content
    end
    NvidiaNimConfig-->>Caller: ModelResponse (reasoning_content="...", content="answer")

_{Last reviewed commit: "fix(nvidia_nim): ext..."}

greptile-apps · 2026-03-21T07:24:37Z

litellm/llms/nvidia_nim/chat/transformation.py


 API calling is done using the OpenAI SDK with an api_base
 """
+from typing import Any, List, Optional, cast


Unused cast import

cast is imported from typing but never referenced anywhere in this file. Remove it to keep the import block clean.

Suggested change

from typing import Any, List, Optional, cast

from typing import Any, List, Optional

greptile-apps · 2026-03-21T07:24:38Z

litellm/llms/nvidia_nim/chat/transformation.py

+    def transform_response(  # type: ignore[override]
+        self,
+        model: str,
+        raw_response: httpx.Response,
+        model_response: ModelResponse,
+        logging_obj: Any,
+        request_data: dict,
+        messages: List[Any],
+        optional_params: dict,
+        litellm_params: dict,
+        encoding: Any,
+        api_key: Optional[str] = None,
+        json_mode: Optional[bool] = None,
+    ) -> ModelResponse:
+        """
+        Override transform_response to extract <think>…</think> reasoning blocks
+        produced by NVIDIA NIM reasoning models (e.g. minimax/minimax-m1).
+
+        NIM forwards the raw model output as a plain OpenAI-compatible response:
+        the <think> block appears inside ``choices[0].message.content`` and
+        ``reasoning_content`` is absent.  The parent class's
+        ``_extract_reasoning_content`` helper already calls
+        ``_parse_content_for_reasoning``, but only after the response has been
+        deserialized.  We do the same pre-processing here on the raw JSON so
+        that the upstream consumer always sees a clean split between
+        ``reasoning_content`` and ``content``.
+        """
+        # Let the parent class build the base ModelResponse first.
+        result = super().transform_response(
+            model=model,
+            raw_response=raw_response,
+            model_response=model_response,
+            logging_obj=logging_obj,
+            request_data=request_data,
+            messages=messages,
+            optional_params=optional_params,
+            litellm_params=litellm_params,
+            encoding=encoding,
+            api_key=api_key,
+            json_mode=json_mode,
+        )
+
+        # Post-process: if reasoning_content is still None but the content
+        # contains <think>…</think> raw tags, extract them now.
+        for choice in getattr(result, "choices", []):
+            message = getattr(choice, "message", None)
+            if message is None:
+                continue
+            if getattr(message, "reasoning_content", None) is not None:
+                # Already parsed — nothing to do.
+                continue
+            content = getattr(message, "content", None)
+            if not isinstance(content, str):
+                continue
+            reasoning, stripped_content = _parse_content_for_reasoning(content)
+            if reasoning is not None:
+                message.reasoning_content = reasoning
+                message.content = stripped_content
+
+        return result


No unit test for the new reasoning extraction

This PR claims to fix issue #24253 but the tests/llm_translation/test_nvidia_nim.py file has no new test that covers the transform_response reasoning-extraction path. Without a mock-based test (e.g. constructing a fake httpx.Response whose body contains <think>…</think> and asserting reasoning_content / content are correctly split), there is no automated verification that the fix works and no regression guard.

Per the project's review standards, a fix claiming to resolve a reported issue should include passing tests as evidence. Please add a unit test similar to the following pattern:

def test_nvidia_nim_transform_response_extracts_reasoning(): from unittest.mock import MagicMock import json, httpx from litellm.llms.nvidia_nim.chat.transformation import NvidiaNimConfig config = NvidiaNimConfig() fake_body = { "id": "test-id", "object": "chat.completion", "created": 1234567890, "model": "minimax/minimax-m1", "choices": [{"index": 0, "message": {"role": "assistant", "content": "<think>The user just sent ping...</think>\n\npong"}, "finish_reason": "stop"}], "usage": {"prompt_tokens": 10, "completion_tokens": 20, "total_tokens": 30}, } raw_response = MagicMock(spec=httpx.Response) raw_response.text = json.dumps(fake_body) raw_response.json.return_value = fake_body raw_response.status_code = 200 raw_response.headers = {} from litellm import ModelResponse result = config.transform_response( model="minimax/minimax-m1", raw_response=raw_response, model_response=ModelResponse(), logging_obj=MagicMock(), request_data={}, messages=[], optional_params={}, litellm_params={}, encoding=None, ) assert result.choices[0].message.reasoning_content == "The user just sent ping..." assert result.choices[0].message.content == "\n\npong"

Rule Used: What: Ensure that any PR claiming to fix an issue ... (source)

NIK-TIGER-BILL added 2 commits March 21, 2026 07:05

vercel bot deployed to Preview March 21, 2026 07:23 View deployment

greptile-apps bot reviewed Mar 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(nvidia_nim): extract <think> reasoning blocks from content in transform_response#24276

fix(nvidia_nim): extract <think> reasoning blocks from content in transform_response#24276
NIK-TIGER-BILL wants to merge 2 commits intoBerriAI:mainfrom
NIK-TIGER-BILL:fix/nvidia-nim-parse-think-tags

NIK-TIGER-BILL commented Mar 21, 2026

Uh oh!

vercel bot commented Mar 21, 2026 •

edited

Loading

Uh oh!

CLAassistant commented Mar 21, 2026

Uh oh!

codspeed-hq bot commented Mar 21, 2026

Uh oh!

greptile-apps bot commented Mar 21, 2026

Important Files Changed

Uh oh!

greptile-apps bot Mar 21, 2026

Uh oh!

greptile-apps bot Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	from typing import Any, List, Optional, cast
	from typing import Any, List, Optional

Uh oh!

Conversation

NIK-TIGER-BILL commented Mar 21, 2026

Summary

Problem

Root cause

Fix

After fix

Uh oh!

vercel bot commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CLAassistant commented Mar 21, 2026

Uh oh!

codspeed-hq bot commented Mar 21, 2026

Merging this PR will not alter performance

Uh oh!

greptile-apps bot commented Mar 21, 2026

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vercel bot commented Mar 21, 2026 •

edited

Loading