Skip to content

fix(messages): prevent audio ID concatenation when merging chunks #8499

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

christian-bromann
Copy link
Contributor

Description

This PR fixes a bug where audio IDs were being incorrectly concatenated when streaming audio chunks from OpenAI's gpt-4o-audio-preview model, causing "Assistant audio not found" errors.

Problem

When streaming audio chunks, the _mergeDicts function in langchain-core/src/messages/base.ts was concatenating audio IDs, transforming:

  • "audio_6871323ab2fc8191b2cf516af96cd851""audio_6871323ab2fc8191b2cf516af96cd851audio_6871323ab2fc8191b2cf516af96cd851"

This resulted in invalid audio IDs that caused subsequent API requests to fail.

Solution

Added a targeted fix to preserve the first non-empty value for audio ID fields instead of concatenating them:

// Special case: preserve the first non-empty value for audio.id field
// This prevents audio ID duplication when streaming audio chunks
if (key === "id" && merged[key] && merged[key].startsWith("audio_")) {
  continue;
}

Testing

  • Added comprehensive unit tests for audio ID preservation
  • Verified other audio fields still concatenate correctly
  • Ensured non-audio ID fields maintain existing behavior
  • All existing message tests pass (51 total)

Future Considerations

While this addresses the immediate audio ID issue, similar fields may need consideration:

  • Request/response IDs (request_id, response_id, conversation_id)
  • Timestamps (fields ending with _at, _time, timestamp)
  • Unique identifiers (fields ending with _id, uuid, guid)
  • Index/position fields (index, position, sequence)
  • Status/state fields (status, state, phase)

Fixes #8487

When streaming audio chunks from OpenAI's gpt-4o-audio-preview model,
audio IDs were being incorrectly concatenated in the _mergeDicts function,
causing "Assistant audio not found" errors on subsequent requests.

The audio ID "audio_6871323ab2fc8191b2cf516af96cd851" would become
"audio_6871323ab2fc8191b2cf516af96cd851audio_6871323ab2fc8191b2cf516af96cd851"
when merging message chunks.

Changes:
- Add special case in _mergeDicts to preserve first non-empty audio ID
- Only affects string fields with key "id" that start with "audio_"
- Other audio fields (transcript, data) continue to concatenate normally
- Add comprehensive unit tests to prevent regressions

Fixes audio streaming functionality while maintaining backward compatibility
for all other message field concatenation behavior.
Copy link

vercel bot commented Jul 14, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
langchainjs-docs ✅ Ready (Inspect) Visit Preview Jul 14, 2025 10:43pm
1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
langchainjs-api-refs ⬜️ Ignored (Inspect) Jul 14, 2025 10:43pm

@hntrl
Copy link
Contributor

hntrl commented Jul 17, 2025

Will call out that there is a similar issue when using the image generation tool with openai. Off the top of my head we retain each partial_image object for streaming responses, but the intended behavior is to keep the latest one instead of retaining all of them. Similar semantics here where we only want to retain one audio id.

I'm also not sure how much of a fan I am of baking in the assumption that ids prefixed with audio_ should always be collated this way. This is how OpenAI views the world, but whos to say anthropic and google will in the near future? There's some work to be done in terms of beefing up the base chunk class that can reconcile with these aggregation considerations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

concat concatenates audio id twice
2 participants