Skip to content

fix(nvidia_nim): extract <think> reasoning blocks from content in transform_response#24276

Open
NIK-TIGER-BILL wants to merge 2 commits intoBerriAI:mainfrom
NIK-TIGER-BILL:fix/nvidia-nim-parse-think-tags
Open

fix(nvidia_nim): extract <think> reasoning blocks from content in transform_response#24276
NIK-TIGER-BILL wants to merge 2 commits intoBerriAI:mainfrom
NIK-TIGER-BILL:fix/nvidia-nim-parse-think-tags

Conversation

@NIK-TIGER-BILL
Copy link

Summary

Fixes #24253

Problem

NVIDIA NIM reasoning models (e.g. minimax/minimax-m1) return their chain-of-thought inside a raw <think>…</think> block in choices[0].message.content. The expected behaviour is for litellm to split this into reasoning_content + content, as it does for DeepSeek R1 and other reasoning providers. Instead, reasoning_content is null and the raw tags appear in content.

Root cause

The litellm response pipeline already calls _parse_content_for_reasoning() via _extract_reasoning_content() in convert_to_model_response_object(). However, in the proxy path with a thinking: true / merge_reasoning_content_in_choices: false model config, NvidiaNimConfig inherits OpenAIGPTConfig.transform_response() which routes through convert_to_model_response_object() — but that function only invokes _extract_reasoning_content() when the choice message dictionary has a string content. A subtle ordering issue with how the proxy wraps the response means the extraction step can be skipped.

Fix

Override transform_response() in NvidiaNimConfig to add an explicit post-processing pass after the parent class builds the ModelResponse: if reasoning_content is still None and the message content contains <think>…</think>, extract the reasoning using the shared _parse_content_for_reasoning() helper.

After fix

{
  "choices": [{
    "message": {
      "role": "assistant",
      "content": "\n\npong",
      "reasoning_content": "The user just sent ping..."
    }
  }]
}

NIK-TIGER-BILL added 2 commits March 21, 2026 07:05
Fixes BerriAI#24234

The embedding transformer already strips the 'openrouter/' prefix before
sending the model name to the API (see litellm/llms/openrouter/embedding/
transformation.py:112-113).  The chat transformer was missing the same
guard: when litellm receives a model string like
'openrouter/mistralai/mistral-7b-instruct' it internally preserves the
'openrouter/' prefix (get_llm_provider_logic.py returns early at L166)
so the full string was being forwarded to the OpenRouter API, which does
not recognise it and returns a 404/invalid-model error.

Fix: strip the 'openrouter/' prefix at the top of
OpenrouterConfig.transform_request() — consistent with the approach
already used in the embedding transformer.
…nsform_response

Fixes BerriAI#24253

NVIDIA NIM reasoning models (e.g. minimax/minimax-m1) return their
chain-of-thought inside a raw <think>…</think> block embedded in
choices[0].message.content.  The litellm response pipeline already
calls _parse_content_for_reasoning() via _extract_reasoning_content()
in convert_to_model_response_object(), but a regression in the proxy
path can prevent that from running for nvidia_nim before the response
is surfaced to the caller — leaving reasoning_content=null and the raw
<think> tags in content.

Fix: override transform_response() in NvidiaNimConfig to explicitly
scan the content after the parent class has finished building the
ModelResponse.  If reasoning_content is still None and content
contains a <think>…</think> block, extract it using the shared
_parse_content_for_reasoning() helper.

Setting thinking: true / merge_reasoning_content_in_choices: false in
the proxy model config will now behave consistently with other
reasoning providers (DeepSeek R1 via OpenRouter, etc.).
@vercel
Copy link

vercel bot commented Mar 21, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Mar 21, 2026 7:23am

Request Review

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


NIK-TIGER-BILL seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@codspeed-hq
Copy link
Contributor

codspeed-hq bot commented Mar 21, 2026

Merging this PR will not alter performance

✅ 16 untouched benchmarks


Comparing NIK-TIGER-BILL:fix/nvidia-nim-parse-think-tags (79a5629) with main (d8e4fc4)

Open in CodSpeed

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 21, 2026

Greptile Summary

This PR makes two independent fixes: (1) adds a transform_response override on NvidiaNimConfig that post-processes the parent-built ModelResponse to extract <think>…</think> reasoning blocks from content into reasoning_content, and (2) strips the openrouter/ provider-prefix from the model name in OpenrouterConfig.transform_request before forwarding it to the OpenRouter API.

Key changes:

  • NvidiaNimConfig.transform_response: calls super().transform_response() first, then iterates over choices — if reasoning_content is None and the content string contains a <think> block, it calls the shared _parse_content_for_reasoning() helper to split the two fields. The approach reuses existing infra and doesn't hard-code model names.
  • OpenrouterConfig.transform_request: adds a model.startswith("openrouter/") guard to strip the litellm-internal prefix so the raw model ID reaches the API. The async path is safe because OpenAIGPTConfig.async_transform_request delegates to self.transform_request for subclasses.
  • An unused cast import was added to nvidia_nim/chat/transformation.py — should be removed.
  • No unit test was added for the new reasoning-extraction logic in NvidiaNimConfig, which is required by the project's review standards to demonstrate that the reported issue is resolved.

Confidence Score: 3/5

  • PR is safe to merge for existing non-reasoning NIM models, but lacks automated test coverage for the new reasoning-extraction path.
  • The core logic is straightforward and correctly reuses an existing helper (_parse_content_for_reasoning). The guard (reasoning_content is not None → skip) prevents double-processing. However, no unit test was added for the new code path, leaving the fix unverified by CI. The unused cast import is trivial. Confidence is lowered mainly because of missing test coverage for the stated fix.
  • litellm/llms/nvidia_nim/chat/transformation.py — needs a unit test for the reasoning extraction path

Important Files Changed

Filename Overview
litellm/llms/nvidia_nim/chat/transformation.py Adds a transform_response override that post-processes the parent's ModelResponse to extract <think>…</think> blocks into reasoning_content. Logic is correct but an unused cast import remains, and no unit tests were added to verify the fix.
litellm/llms/openrouter/chat/transformation.py Adds openrouter/ prefix-stripping in transform_request before the model name is sent to the API. The async path is also covered because OpenAIGPTConfig.async_transform_request delegates to self.transform_request for non-base-class instances. Change looks correct.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant NvidiaNimConfig
    participant OpenAIGPTConfig
    participant _parse_content_for_reasoning

    Caller->>NvidiaNimConfig: transform_response(raw_response, ...)
    NvidiaNimConfig->>OpenAIGPTConfig: super().transform_response(...)
    OpenAIGPTConfig-->>NvidiaNimConfig: ModelResponse (reasoning_content=None, content="<think>...</think> answer")
    loop Each choice in result
        NvidiaNimConfig->>NvidiaNimConfig: check reasoning_content is None
        NvidiaNimConfig->>NvidiaNimConfig: check content is str
        NvidiaNimConfig->>_parse_content_for_reasoning: _parse_content_for_reasoning(content)
        _parse_content_for_reasoning-->>NvidiaNimConfig: (reasoning, stripped_content)
        NvidiaNimConfig->>NvidiaNimConfig: message.reasoning_content = reasoning
        NvidiaNimConfig->>NvidiaNimConfig: message.content = stripped_content
    end
    NvidiaNimConfig-->>Caller: ModelResponse (reasoning_content="...", content="answer")
Loading

Last reviewed commit: "fix(nvidia_nim): ext..."

API calling is done using the OpenAI SDK with an api_base
"""
from typing import Any, List, Optional, cast
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Unused cast import

cast is imported from typing but never referenced anywhere in this file. Remove it to keep the import block clean.

Suggested change
from typing import Any, List, Optional, cast
from typing import Any, List, Optional

Comment on lines +120 to +179
def transform_response( # type: ignore[override]
self,
model: str,
raw_response: httpx.Response,
model_response: ModelResponse,
logging_obj: Any,
request_data: dict,
messages: List[Any],
optional_params: dict,
litellm_params: dict,
encoding: Any,
api_key: Optional[str] = None,
json_mode: Optional[bool] = None,
) -> ModelResponse:
"""
Override transform_response to extract <think>…</think> reasoning blocks
produced by NVIDIA NIM reasoning models (e.g. minimax/minimax-m1).
NIM forwards the raw model output as a plain OpenAI-compatible response:
the <think> block appears inside ``choices[0].message.content`` and
``reasoning_content`` is absent. The parent class's
``_extract_reasoning_content`` helper already calls
``_parse_content_for_reasoning``, but only after the response has been
deserialized. We do the same pre-processing here on the raw JSON so
that the upstream consumer always sees a clean split between
``reasoning_content`` and ``content``.
"""
# Let the parent class build the base ModelResponse first.
result = super().transform_response(
model=model,
raw_response=raw_response,
model_response=model_response,
logging_obj=logging_obj,
request_data=request_data,
messages=messages,
optional_params=optional_params,
litellm_params=litellm_params,
encoding=encoding,
api_key=api_key,
json_mode=json_mode,
)

# Post-process: if reasoning_content is still None but the content
# contains <think>…</think> raw tags, extract them now.
for choice in getattr(result, "choices", []):
message = getattr(choice, "message", None)
if message is None:
continue
if getattr(message, "reasoning_content", None) is not None:
# Already parsed — nothing to do.
continue
content = getattr(message, "content", None)
if not isinstance(content, str):
continue
reasoning, stripped_content = _parse_content_for_reasoning(content)
if reasoning is not None:
message.reasoning_content = reasoning
message.content = stripped_content

return result
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 No unit test for the new reasoning extraction

This PR claims to fix issue #24253 but the tests/llm_translation/test_nvidia_nim.py file has no new test that covers the transform_response reasoning-extraction path. Without a mock-based test (e.g. constructing a fake httpx.Response whose body contains <think>…</think> and asserting reasoning_content / content are correctly split), there is no automated verification that the fix works and no regression guard.

Per the project's review standards, a fix claiming to resolve a reported issue should include passing tests as evidence. Please add a unit test similar to the following pattern:

def test_nvidia_nim_transform_response_extracts_reasoning():
    from unittest.mock import MagicMock
    import json, httpx
    from litellm.llms.nvidia_nim.chat.transformation import NvidiaNimConfig

    config = NvidiaNimConfig()
    fake_body = {
        "id": "test-id",
        "object": "chat.completion",
        "created": 1234567890,
        "model": "minimax/minimax-m1",
        "choices": [{"index": 0, "message": {"role": "assistant", "content": "<think>The user just sent ping...</think>\n\npong"}, "finish_reason": "stop"}],
        "usage": {"prompt_tokens": 10, "completion_tokens": 20, "total_tokens": 30},
    }
    raw_response = MagicMock(spec=httpx.Response)
    raw_response.text = json.dumps(fake_body)
    raw_response.json.return_value = fake_body
    raw_response.status_code = 200
    raw_response.headers = {}

    from litellm import ModelResponse
    result = config.transform_response(
        model="minimax/minimax-m1",
        raw_response=raw_response,
        model_response=ModelResponse(),
        logging_obj=MagicMock(),
        request_data={},
        messages=[],
        optional_params={},
        litellm_params={},
        encoding=None,
    )
    assert result.choices[0].message.reasoning_content == "The user just sent ping..."
    assert result.choices[0].message.content == "\n\npong"

Rule Used: What: Ensure that any PR claiming to fix an issue ... (source)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: reasoning_content not parsed from raw <think> tags for nvidia_nim/minimax-m2.5 via LiteLLM Proxy

2 participants