Skip to content

chore(weave): Realtime API, support collecting audio data#6249

Open
chance-wnb wants to merge 1 commit intochance/realtime_tool_callfrom
chance/realtime_audio_support
Open

chore(weave): Realtime API, support collecting audio data#6249
chance-wnb wants to merge 1 commit intochance/realtime_tool_callfrom
chance/realtime_audio_support

Conversation

@chance-wnb
Copy link
Contributor

@chance-wnb chance-wnb commented Mar 3, 2026

Description

Adds audio capture and serialization support to the OpenAI Realtime API integration. The adapter now accumulates raw PCM audio chunks during streaming and converts them to WAV format when the audio call ends. A new serializeAudio method is exposed on the WeaveClient for manual audio serialization in call outputs.

Key changes:

  • Added pcmToWav helper function to convert 24kHz 16-bit mono PCM to WAV format
  • Modified audio event handler to accumulate PCM chunks per response ID
  • Updated closeAudioCall to serialize accumulated audio chunks and include them in call output
  • Added public serializeAudio method to WeaveClient for manual audio serialization
  • Added proper cleanup of audio chunks on disconnect and detach

Screenshot

image.png

Testing

  • Locally tested
  • Unit tests are expected in the upper stack PRs later

Copy link
Contributor Author

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

@chance-wnb chance-wnb marked this pull request as ready for review March 3, 2026 02:10
@chance-wnb chance-wnb requested a review from a team as a code owner March 3, 2026 02:10
@codecov
Copy link

codecov bot commented Mar 3, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@wandbot-3000
Copy link

wandbot-3000 bot commented Mar 3, 2026

Comment on lines +42 to +62
function pcmToWav(pcm: Buffer): Buffer {
const channels = 1;
const sampleRate = 24000;
const bitDepth = 16;
const wav = Buffer.alloc(44 + pcm.length);
wav.write('RIFF', 0);
wav.writeUInt32LE(36 + pcm.length, 4);
wav.write('WAVE', 8);
wav.write('fmt ', 12);
wav.writeUInt32LE(16, 16);
wav.writeUInt16LE(1, 20); // PCM
wav.writeUInt16LE(channels, 22);
wav.writeUInt32LE(sampleRate, 24);
wav.writeUInt32LE(sampleRate * channels * (bitDepth / 8), 28);
wav.writeUInt16LE(channels * (bitDepth / 8), 32);
wav.writeUInt16LE(bitDepth, 34);
wav.write('data', 36);
wav.writeUInt32LE(pcm.length, 40);
wav.set(pcm, 44); // Uint8Array.set — accepts ArrayLike<number>, no Buffer-copy type issues
return wav;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a costly operation? It seems that it is just setting the data in the right container, any memory copy?

Copy link
Contributor Author

@chance-wnb chance-wnb Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know the javascript Buffer data structure is already the right tool for byte-wise operations. It is already much efficient than the classic js arrays.

wav.set(pcm, 44);

This is the memory copy part. The previous lines are trivial (constant time despite many).

Is this a costly operation

I think it is alright. I can't think of doing it any other ways. The format must be converted as far as I know.

PS: this is apparently AI generated code, I am not capable of writing such a thing myself. lol. I guess it is better than importing a 3rd party library.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a costly operation

As a friendly reminder the audio stream conversion is done once per closeAudioCall event.

Let me know if you feel something is fishy and have improvement proposals. Thanks!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we support the original format via Content? cc @zbirenbaum

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PCM detection doesn't work properly (maybe that's changed with use of python magic) I had to convert to wav for my impl as well. Maybe we could solve this by manually setting the mimetype?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants