chore(weave): Realtime API, support collecting audio data#6249
chore(weave): Realtime API, support collecting audio data#6249chance-wnb wants to merge 1 commit intochance/realtime_tool_callfrom
Conversation
|
Warning This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
This stack of pull requests is managed by Graphite. Learn more about stacking. |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
|
Preview this PR with FeatureBee: https://beta.wandb.ai/?betaVersion=7cc9136314b89235fe834290c5f92c8cdf9822f2 |
| function pcmToWav(pcm: Buffer): Buffer { | ||
| const channels = 1; | ||
| const sampleRate = 24000; | ||
| const bitDepth = 16; | ||
| const wav = Buffer.alloc(44 + pcm.length); | ||
| wav.write('RIFF', 0); | ||
| wav.writeUInt32LE(36 + pcm.length, 4); | ||
| wav.write('WAVE', 8); | ||
| wav.write('fmt ', 12); | ||
| wav.writeUInt32LE(16, 16); | ||
| wav.writeUInt16LE(1, 20); // PCM | ||
| wav.writeUInt16LE(channels, 22); | ||
| wav.writeUInt32LE(sampleRate, 24); | ||
| wav.writeUInt32LE(sampleRate * channels * (bitDepth / 8), 28); | ||
| wav.writeUInt16LE(channels * (bitDepth / 8), 32); | ||
| wav.writeUInt16LE(bitDepth, 34); | ||
| wav.write('data', 36); | ||
| wav.writeUInt32LE(pcm.length, 40); | ||
| wav.set(pcm, 44); // Uint8Array.set — accepts ArrayLike<number>, no Buffer-copy type issues | ||
| return wav; | ||
| } |
There was a problem hiding this comment.
Is this a costly operation? It seems that it is just setting the data in the right container, any memory copy?
There was a problem hiding this comment.
As far as I know the javascript Buffer data structure is already the right tool for byte-wise operations. It is already much efficient than the classic js arrays.
wav.set(pcm, 44);
This is the memory copy part. The previous lines are trivial (constant time despite many).
Is this a costly operation
I think it is alright. I can't think of doing it any other ways. The format must be converted as far as I know.
PS: this is apparently AI generated code, I am not capable of writing such a thing myself. lol. I guess it is better than importing a 3rd party library.
There was a problem hiding this comment.
Is this a costly operation
As a friendly reminder the audio stream conversion is done once per closeAudioCall event.
Let me know if you feel something is fishy and have improvement proposals. Thanks!
There was a problem hiding this comment.
Don't we support the original format via Content? cc @zbirenbaum
There was a problem hiding this comment.
PCM detection doesn't work properly (maybe that's changed with use of python magic) I had to convert to wav for my impl as well. Maybe we could solve this by manually setting the mimetype?

Description
Adds audio capture and serialization support to the OpenAI Realtime API integration. The adapter now accumulates raw PCM audio chunks during streaming and converts them to WAV format when the audio call ends. A new
serializeAudiomethod is exposed on the WeaveClient for manual audio serialization in call outputs.Key changes:
pcmToWavhelper function to convert 24kHz 16-bit mono PCM to WAV formatcloseAudioCallto serialize accumulated audio chunks and include them in call outputserializeAudiomethod to WeaveClient for manual audio serializationScreenshot
Testing