Skip to content

[Bugfix]: Fix structured output in multi-turn gpt-oss#34454

Open
bbrowning wants to merge 5 commits intovllm-project:mainfrom
bbrowning:gptoss-multiturn-reasoning-structured
Open

[Bugfix]: Fix structured output in multi-turn gpt-oss#34454
bbrowning wants to merge 5 commits intovllm-project:mainfrom
bbrowning:gptoss-multiturn-reasoning-structured

Conversation

@bbrowning
Copy link
Contributor

@bbrowning bbrowning commented Feb 12, 2026

Purpose

The logic in the gptoss_reasoning_parser to detect when the model has finished outputting reasoning content and starting to output content to the final channel was inadvertently matching on final channel messages from previous messages for multi-turn scenarios. In practice this meant that vLLM started applying the grammar bitmasks to the entirety of the model's output in these multi-turn conversations prematurely, causing the model to deviate from its trained Harmony format and lead to empty or invalid outputs.

This PR fixes things by never looking for the final channel marker in any message prior to the current one the model is generating so that we don't falsely believe the model is starting generation of the final channel unless it's actually doing so during this turn of the conversation.

Prior to vLLM v0.13.0 this bug existed but we didn't actually trip over it because the way we handle multi-turn conversation state with gpt-oss models was missing important tokens that coincidentally caused those prior conversations to not actually match these token id checks. But, once we fixed multi-turn conversation state, that caused structured output usage with things like json_object response formats to then hit this bug in the reasoning parser.

Fixes #32791

Test Plan

I added a unit test specifically to cover this case, following test-driven-development by ensuring the test failed initially, applied my fix, and then ensured the test passed.

The existing and new gptoss_reasoning_parser unit tests were run via:

pytest tests/reasoning/test_gptoss_reasoning_parser.py
pytest tests/v1/structured_output/test_gptoss_structural_tags.py
pytest tests/entrypoints/openai/test_gptoss_structural_tags_integration.py

Additionally, I ran the manual reproducer (labeled as case 3) in #32791:

vllm serve openai/gpt-oss-20b \
  --tool-call-parser openai \
  --enable-auto-tool-choice

curl -s http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer dummy" \
  -d '{
    "model": "openai/gpt-oss-20b",
    "messages": [
      {
        "role": "user",
        "content": "Respond with JSON only in the form {\"response\":\"hello\"}."
      },
      {
        "role": "assistant",
        "content": "{\"response\":\"hello\"}"
      },
      {
        "role": "user",
        "content": "Respond with JSON only in the form {\"response\":\"bye\"}."
      }
    ],
    "response_format": { "type": "json_object" },
    "max_tokens": 128,
    "temperature": 0
  }' | jq .

Test Result

All the unit tests passed.

For the manual curl test, prior to this change it gave a response with empty content:

{
  "id": "chatcmpl-81416dae965f4f7d",
  "object": "chat.completion",
  "created": 1770920903,
  "model": "openai/gpt-oss-20b",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": null,
        "refusal": null,
        "annotations": null,
        "audio": null,
        "function_call": null,
        "tool_calls": [],
        "reasoning": null
      },
      "logprobs": null,
      "finish_reason": "stop",
...

After this change, the model gives the expected response:

{
  "id": "chatcmpl-9c7eb34a997d07e2",
  "object": "chat.completion",
  "created": 1770923019,
  "model": "openai/gpt-oss-20b",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "{\"response\":\"bye\"}",
        "refusal": null,
        "annotations": null,
        "audio": null,
        "function_call": null,
        "tool_calls": [],
        "reasoning": "The user wants JSON only: {\"response\":\"bye\"}. So output that."
      },
      "logprobs": null,
      "finish_reason": "stop",
...

@mergify mergify bot added gpt-oss Related to GPT-OSS models bug Something isn't working labels Feb 12, 2026
The logic in the gptoss_reasoning_parser to detect when the model has
finished outputting reasoning content is is starting to output content
to the final channel was inadvertently matching on final channel
messages from previous messages for multi-turn scenarios. In practice
this meant that vLLM started applying the grammar bitmasks to the
entirety of the model's output in these multi-turn conversations
prematurely, causing the model to deviate from its trained Harmony
format and lead to empty or invalid outputs.

This PR fixes things by never looking for the final channel marker in any
message prior to the current one the model is generating so that we
don't falsely believe the model is starting generation of the final
channel unless it's actually doing so during this turn of the
conversation.

Prior to vLLM v0.13.0 this bug existed but we didn't actually trip over
it because the way we handle multi-turn conversation state with gpt-oss
models was missing important tokens that coincidentally caused those
prior conversations to not actually match these token id checks. But,
once we fixed multi-turn conversation state, that caused structured
output usage with things like `json_object` response formats to then hit
this bug in the reasoning parser.

Fixes vllm-project#32791

Signed-off-by: Ben Browning <bbrownin@redhat.com>
@bbrowning bbrowning force-pushed the gptoss-multiturn-reasoning-structured branch from 65c163e to c851d60 Compare February 12, 2026 19:27
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a critical bug in the gptoss_reasoning_parser that caused premature termination of reasoning in multi-turn conversations, leading to incorrect structured outputs. The fix, which involves stopping the backward search for the end-of-reasoning marker upon encountering a message boundary from a previous turn, is logical and well-implemented. The inclusion of a specific unit test to cover this multi-turn scenario is a great addition and significantly improves the robustness of the parser. Overall, the changes are excellent and effectively resolve the described issue. I have one suggestion to further improve the robustness of the code.

Signed-off-by: Ben Browning <bbrownin@redhat.com>
Instead of .encode followed by taking the first token, it's cleaner to just directly use model_tokenizer.vocab to fetch single token ids.

Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Signed-off-by: Ben Browning <bbrownin@redhat.com>
Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing!

@github-project-automation github-project-automation bot moved this from To Triage to Ready in gpt-oss Issues & Enhancements Feb 13, 2026
@DarkLight1337 DarkLight1337 enabled auto-merge (squash) February 13, 2026 13:13
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 13, 2026
CI discovered some additional tests that use gptoss_reasoning_parser but
with a mocked tokenizer. So, this adds a mocked `vocab` to that mock
tokenizer so that these tests also pass.

Signed-off-by: Ben Browning <bbrownin@redhat.com>
auto-merge was automatically disabled February 13, 2026 14:18

Head branch was pushed to by a user without write access

@bbrowning
Copy link
Contributor Author

CI picked up some additional tests that used gptoss_reasoning_parser but with a mocked tokenizer that failed after adjusting to use .vocab instead of .encode. So, I pushed one more commit adding a vocab mock to those mock tokenizers, grepped the tests to ensure no other tests use gptoss_reasoning_parser that need updating, and updated the test plan in the PR description to reflect running the 3 unit tests that touch this code:

pytest tests/reasoning/test_gptoss_reasoning_parser.py
pytest tests/v1/structured_output/test_gptoss_structural_tags.py
pytest tests/entrypoints/openai/test_gptoss_structural_tags_integration.py

The latter two failed and caught by CI, but are passing locally now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working gpt-oss Related to GPT-OSS models ready ONLY add when PR is ready to merge/full CI is needed structured-output v1

Projects

Status: No status
Status: Ready

Development

Successfully merging this pull request may close these issues.

[Bug]: chat.completions returns content: null for GPT-OSS multi-turn with json_object

2 participants