Skip to content

AssistantMessageItem has empty content in RealtimeHistoryUpdated/RealtimeHistoryAdded events #2161

@Wabbrik

Description

@Wabbrik

Describe the bug

TEXT mode AssistantMessageItem has empty content in RealtimeHistoryUpdated/RealtimeHistoryAdded events

When using the realtime api in TEXT mode (modalities: ["text"]), the AssistantMessageItem objects provided in RealtimeHistoryUpdated and RealtimeHistoryAdded events have empty content arrays, even though the assistant has responded with text.

In VOICE mode (modalities: ["audio"]), the AssistantMessageItem correctly contains the audio transcript.

Root Cause
The bug is in openai_realtime.py in the _handle_ws_event() method (around line 550-560).

When processing response.output_item.done events, the sdk checks for content types:

if part.get("type") == "audio":
    converted_content.append({
        "type": "audio",
        "audio": part.get("audio"),
        "transcript": part.get("transcript"),
    })
elif part.get("type") == "text":
    converted_content.append({"type": "text", "text": part.get("text")})

Problem: the realtime api sends TEXT mode content with type: "output_text", not type: "text".

The SDK correctly handles this conversion in _ConversionHelper.conversation_item_to_realtime_message_item() (line 949-954):

if each.type == "output_text":
    # For backward-compatibility of assistant message items
    c["type"] = "text"

But this conversion is missing from _handle_ws_event(), so TEXT mode content is silently dropped.

the fix seems simple enough:

CHange:

elif part.get("type") == "text":
    converted_content.append({"type": "text", "text": part.get("text")})

to

elif part.get("type") in ("text", "output_text"):
    converted_content.append({"type": "text", "text": part.get("text")})

Debug information

  • Agents SDK version: 0.6.2
  • Python version (e.g. Python 3.11)

Repro steps

  1. Create a RealtimeRunner with TEXT modality
  2. Send a user message and wait for the assistant response
  3. Listen for RealtimeHistoryUpdated or RealtimeHistoryAdded events
  4. Inspect the AssistantMessageItem - the content array will be empty

Expected behavior

The AssistantMessageItem.content should contain an AssistantText object with the response text, similar to how VOICE mode contains AssistantAudio with the transcript.

Extra (our workaround)

Subscribe to raw model events and extract text from response.output_text.done:

    if isinstance(event, RealtimeRawModelEvent):
        if isinstance(event.data, RealtimeModelRawServerEvent):
            data = event.data.data
            if data.get("type") == "response.output_text.done":
                item_id = data.get("item_id")
                text = data.get("text")

(plus extra state management on all the generic modality path in order to handle missing text from AssistantMessageItem when the modality is TEXT)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions