fix(nvidia_nim): extract <think> reasoning blocks from content in transform_response#24276
fix(nvidia_nim): extract <think> reasoning blocks from content in transform_response#24276NIK-TIGER-BILL wants to merge 2 commits intoBerriAI:mainfrom
Conversation
Fixes BerriAI#24234 The embedding transformer already strips the 'openrouter/' prefix before sending the model name to the API (see litellm/llms/openrouter/embedding/ transformation.py:112-113). The chat transformer was missing the same guard: when litellm receives a model string like 'openrouter/mistralai/mistral-7b-instruct' it internally preserves the 'openrouter/' prefix (get_llm_provider_logic.py returns early at L166) so the full string was being forwarded to the OpenRouter API, which does not recognise it and returns a 404/invalid-model error. Fix: strip the 'openrouter/' prefix at the top of OpenrouterConfig.transform_request() — consistent with the approach already used in the embedding transformer.
…nsform_response Fixes BerriAI#24253 NVIDIA NIM reasoning models (e.g. minimax/minimax-m1) return their chain-of-thought inside a raw <think>…</think> block embedded in choices[0].message.content. The litellm response pipeline already calls _parse_content_for_reasoning() via _extract_reasoning_content() in convert_to_model_response_object(), but a regression in the proxy path can prevent that from running for nvidia_nim before the response is surfaced to the caller — leaving reasoning_content=null and the raw <think> tags in content. Fix: override transform_response() in NvidiaNimConfig to explicitly scan the content after the parent class has finished building the ModelResponse. If reasoning_content is still None and content contains a <think>…</think> block, extract it using the shared _parse_content_for_reasoning() helper. Setting thinking: true / merge_reasoning_content_in_choices: false in the proxy model config will now behave consistently with other reasoning providers (DeepSeek R1 via OpenRouter, etc.).
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
NIK-TIGER-BILL seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
Greptile SummaryThis PR makes two independent fixes: (1) adds a Key changes:
Confidence Score: 3/5
|
| Filename | Overview |
|---|---|
| litellm/llms/nvidia_nim/chat/transformation.py | Adds a transform_response override that post-processes the parent's ModelResponse to extract <think>…</think> blocks into reasoning_content. Logic is correct but an unused cast import remains, and no unit tests were added to verify the fix. |
| litellm/llms/openrouter/chat/transformation.py | Adds openrouter/ prefix-stripping in transform_request before the model name is sent to the API. The async path is also covered because OpenAIGPTConfig.async_transform_request delegates to self.transform_request for non-base-class instances. Change looks correct. |
Sequence Diagram
sequenceDiagram
participant Caller
participant NvidiaNimConfig
participant OpenAIGPTConfig
participant _parse_content_for_reasoning
Caller->>NvidiaNimConfig: transform_response(raw_response, ...)
NvidiaNimConfig->>OpenAIGPTConfig: super().transform_response(...)
OpenAIGPTConfig-->>NvidiaNimConfig: ModelResponse (reasoning_content=None, content="<think>...</think> answer")
loop Each choice in result
NvidiaNimConfig->>NvidiaNimConfig: check reasoning_content is None
NvidiaNimConfig->>NvidiaNimConfig: check content is str
NvidiaNimConfig->>_parse_content_for_reasoning: _parse_content_for_reasoning(content)
_parse_content_for_reasoning-->>NvidiaNimConfig: (reasoning, stripped_content)
NvidiaNimConfig->>NvidiaNimConfig: message.reasoning_content = reasoning
NvidiaNimConfig->>NvidiaNimConfig: message.content = stripped_content
end
NvidiaNimConfig-->>Caller: ModelResponse (reasoning_content="...", content="answer")
Last reviewed commit: "fix(nvidia_nim): ext..."
| API calling is done using the OpenAI SDK with an api_base | ||
| """ | ||
| from typing import Any, List, Optional, cast |
| def transform_response( # type: ignore[override] | ||
| self, | ||
| model: str, | ||
| raw_response: httpx.Response, | ||
| model_response: ModelResponse, | ||
| logging_obj: Any, | ||
| request_data: dict, | ||
| messages: List[Any], | ||
| optional_params: dict, | ||
| litellm_params: dict, | ||
| encoding: Any, | ||
| api_key: Optional[str] = None, | ||
| json_mode: Optional[bool] = None, | ||
| ) -> ModelResponse: | ||
| """ | ||
| Override transform_response to extract <think>…</think> reasoning blocks | ||
| produced by NVIDIA NIM reasoning models (e.g. minimax/minimax-m1). | ||
| NIM forwards the raw model output as a plain OpenAI-compatible response: | ||
| the <think> block appears inside ``choices[0].message.content`` and | ||
| ``reasoning_content`` is absent. The parent class's | ||
| ``_extract_reasoning_content`` helper already calls | ||
| ``_parse_content_for_reasoning``, but only after the response has been | ||
| deserialized. We do the same pre-processing here on the raw JSON so | ||
| that the upstream consumer always sees a clean split between | ||
| ``reasoning_content`` and ``content``. | ||
| """ | ||
| # Let the parent class build the base ModelResponse first. | ||
| result = super().transform_response( | ||
| model=model, | ||
| raw_response=raw_response, | ||
| model_response=model_response, | ||
| logging_obj=logging_obj, | ||
| request_data=request_data, | ||
| messages=messages, | ||
| optional_params=optional_params, | ||
| litellm_params=litellm_params, | ||
| encoding=encoding, | ||
| api_key=api_key, | ||
| json_mode=json_mode, | ||
| ) | ||
|
|
||
| # Post-process: if reasoning_content is still None but the content | ||
| # contains <think>…</think> raw tags, extract them now. | ||
| for choice in getattr(result, "choices", []): | ||
| message = getattr(choice, "message", None) | ||
| if message is None: | ||
| continue | ||
| if getattr(message, "reasoning_content", None) is not None: | ||
| # Already parsed — nothing to do. | ||
| continue | ||
| content = getattr(message, "content", None) | ||
| if not isinstance(content, str): | ||
| continue | ||
| reasoning, stripped_content = _parse_content_for_reasoning(content) | ||
| if reasoning is not None: | ||
| message.reasoning_content = reasoning | ||
| message.content = stripped_content | ||
|
|
||
| return result |
There was a problem hiding this comment.
No unit test for the new reasoning extraction
This PR claims to fix issue #24253 but the tests/llm_translation/test_nvidia_nim.py file has no new test that covers the transform_response reasoning-extraction path. Without a mock-based test (e.g. constructing a fake httpx.Response whose body contains <think>…</think> and asserting reasoning_content / content are correctly split), there is no automated verification that the fix works and no regression guard.
Per the project's review standards, a fix claiming to resolve a reported issue should include passing tests as evidence. Please add a unit test similar to the following pattern:
def test_nvidia_nim_transform_response_extracts_reasoning():
from unittest.mock import MagicMock
import json, httpx
from litellm.llms.nvidia_nim.chat.transformation import NvidiaNimConfig
config = NvidiaNimConfig()
fake_body = {
"id": "test-id",
"object": "chat.completion",
"created": 1234567890,
"model": "minimax/minimax-m1",
"choices": [{"index": 0, "message": {"role": "assistant", "content": "<think>The user just sent ping...</think>\n\npong"}, "finish_reason": "stop"}],
"usage": {"prompt_tokens": 10, "completion_tokens": 20, "total_tokens": 30},
}
raw_response = MagicMock(spec=httpx.Response)
raw_response.text = json.dumps(fake_body)
raw_response.json.return_value = fake_body
raw_response.status_code = 200
raw_response.headers = {}
from litellm import ModelResponse
result = config.transform_response(
model="minimax/minimax-m1",
raw_response=raw_response,
model_response=ModelResponse(),
logging_obj=MagicMock(),
request_data={},
messages=[],
optional_params={},
litellm_params={},
encoding=None,
)
assert result.choices[0].message.reasoning_content == "The user just sent ping..."
assert result.choices[0].message.content == "\n\npong"Rule Used: What: Ensure that any PR claiming to fix an issue ... (source)
Summary
Fixes #24253
Problem
NVIDIA NIM reasoning models (e.g.
minimax/minimax-m1) return their chain-of-thought inside a raw<think>…</think>block inchoices[0].message.content. The expected behaviour is for litellm to split this intoreasoning_content+content, as it does for DeepSeek R1 and other reasoning providers. Instead,reasoning_contentisnulland the raw tags appear incontent.Root cause
The litellm response pipeline already calls
_parse_content_for_reasoning()via_extract_reasoning_content()inconvert_to_model_response_object(). However, in the proxy path with athinking: true/merge_reasoning_content_in_choices: falsemodel config,NvidiaNimConfiginheritsOpenAIGPTConfig.transform_response()which routes throughconvert_to_model_response_object()— but that function only invokes_extract_reasoning_content()when the choice message dictionary has a stringcontent. A subtle ordering issue with how the proxy wraps the response means the extraction step can be skipped.Fix
Override
transform_response()inNvidiaNimConfigto add an explicit post-processing pass after the parent class builds theModelResponse: ifreasoning_contentis stillNoneand the message content contains<think>…</think>, extract the reasoning using the shared_parse_content_for_reasoning()helper.After fix
{ "choices": [{ "message": { "role": "assistant", "content": "\n\npong", "reasoning_content": "The user just sent ping..." } }] }