Skip to content

Conversation

@neubig
Copy link
Contributor

@neubig neubig commented Nov 12, 2025

Overview

Add support for MiniMax-M2 interleaved thinking when used via OpenRouter by adding openrouter/minimax-m2 to SEND_REASONING_CONTENT_PATTERNS.

Problem

MiniMax-M2 uses "interleaved thinking" where reasoning is mixed with responses across multi-turn conversations. When using MiniMax-M2 through OpenRouter, the reasoning content must be preserved and sent back in subsequent requests to maintain the model's chain of thought. Without this preservation, users reported significant performance degradation (up to 40% accuracy drop on complex tasks).

Solution

How It Works

  1. OpenRouter normalization: OpenRouter receives MiniMax-M2's response and normalizes reasoning to a reasoning field
  2. LiteLLM transformation: LiteLLM transforms OpenRouter's reasoningreasoning_content
    # From litellm/llms/openrouter/chat/transformation.py
    choice["delta"]["reasoning_content"] = choice["delta"].get("reasoning")
  3. SDK captures it: The SDK's Message.from_llm_chat_message() automatically captures reasoning_content
  4. This PR enables sending it back: By adding to SEND_REASONING_CONTENT_PATTERNS, the SDK will send reasoning_content in subsequent requests

Why openrouter/minimax-m2 Specifically?

The pattern matching uses substring matching, so:

  • "openrouter/minimax-m2" matches only openrouter/minimax-m2
  • "minimax-m2" would match minimax-m2, openrouter/minimax-m2, groq/minimax-m2, etc.

Using the specific pattern avoids affecting direct API users who may use:

  • Anthropic-compatible mode (uses thinking_blocks)
  • OpenAI-compatible mode with <think> tags

Changes

  • Added "openrouter/minimax-m2" to SEND_REASONING_CONTENT_PATTERNS in model_features.py

Testing

Verified with pre-commit hooks:

  • ✅ Ruff format
  • ✅ Ruff lint
  • ✅ pycodestyle
  • ✅ pyright type checking

Manual Testing Scenarios

Users should test:

  1. Multi-turn conversations with model="openrouter/minimax-m2"
  2. Verify reasoning content is preserved in message history
  3. Compare performance with/without reasoning preservation

Impact

  • Performance improvement: Maintains MiniMax-M2's reasoning chain across turns
  • Critical for agentic workflows: Especially important for tasks requiring 200-300+ tool calls
  • No breaking changes: Only affects openrouter/minimax-m2 model identifier

References

Related Work

This follows the same pattern as Kimi K2-Thinking support, which was the first model added to SEND_REASONING_CONTENT_PATTERNS. Both models benefit from reasoning content preservation, though they achieve it through different API mechanisms.

@neubig can click here to continue refining the PR


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.12-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:706d74a-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-706d74a-python \
  ghcr.io/openhands/agent-server:706d74a-python

All tags pushed for this build

ghcr.io/openhands/agent-server:706d74a-golang-amd64
ghcr.io/openhands/agent-server:706d74a-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:706d74a-golang-arm64
ghcr.io/openhands/agent-server:706d74a-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:706d74a-java-amd64
ghcr.io/openhands/agent-server:706d74a-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:706d74a-java-arm64
ghcr.io/openhands/agent-server:706d74a-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:706d74a-python-amd64
ghcr.io/openhands/agent-server:706d74a-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:706d74a-python-arm64
ghcr.io/openhands/agent-server:706d74a-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:706d74a-golang
ghcr.io/openhands/agent-server:706d74a-java
ghcr.io/openhands/agent-server:706d74a-python

About Multi-Architecture Support

  • Each variant tag (e.g., 706d74a-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., 706d74a-python-amd64) are also available if needed

Add support for MiniMax-M2 interleaved thinking via OpenRouter.

When MiniMax-M2 is used through OpenRouter, LiteLLM normalizes
OpenRouter's 'reasoning' field to 'reasoning_content'. To maintain
the model's chain of thought across multi-turn conversations, this
reasoning content must be sent back in subsequent requests.

By adding 'openrouter/minimax-m2' to SEND_REASONING_CONTENT_PATTERNS,
the SDK will automatically preserve and send reasoning content, which
is critical for performance in agentic workflows.

The pattern is specific to OpenRouter to avoid affecting direct API
users who may use Anthropic-compatible mode (thinking_blocks) or
think tags.

References:
- MiniMax blog on interleaved thinking importance
- OpenRouter's reasoning tokens documentation
- LiteLLM's OpenRouter transformation implementation

Co-authored-by: openhands <[email protected]>
@li-boxuan
Copy link
Collaborator

Is there any reason this is still WIP? and btw shall we evaluate it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants