Skip to content

Commit 701d447

Browse files
committed
fix tabs, move content, edits from draft readers
1 parent 96fcd92 commit 701d447

File tree

1 file changed

+31
-44
lines changed
  • website/docs/main/compatibility-api/guides/voice/nodejs/realtime-streaming-to-openai

1 file changed

+31
-44
lines changed

website/docs/main/compatibility-api/guides/voice/nodejs/realtime-streaming-to-openai/index.mdx

Lines changed: 31 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -25,14 +25,30 @@ In this guide, we will build a Node.js application that serves a
2525
[cXML Script][cxml]
2626
that initiates a two-way (bidirectional)
2727
[`<Stream>`][bidir-stream]
28-
to the OpenAI Realtime API.
29-
When a caller initiates a SIP or
30-
<Tooltips tip="Public Switched Telephone Network">PSTN</Tooltips>
31-
call to the assigned phone number,
32-
the SignalWire platform requests and runs the script.
28+
to a Speech-to-Speech model on the OpenAI Realtime API.
29+
When a caller initiates a call to the assigned phone number,
30+
the SignalWire platform requests and runs the cXML script.
31+
32+
```mermaid
33+
graph LR
34+
A[Phone call] --> B[SignalWire]
35+
B --> C[WebSocket]
36+
C --> D[Transport layer]
37+
D --> E[OpenAI Realtime]
38+
E --> D
39+
D --> C
40+
C --> B
41+
B --> A
42+
```
3343

3444
{/* This architectural explainer is a DRAFT. It could be useful, but needs further refinement.
3545
46+
**Audio Flow Details:**
47+
- **Inbound**: Phone → SignalWire → Base64 → Transport → ArrayBuffer → OpenAI
48+
- **Outbound**: OpenAI → ArrayBuffer → Transport → Base64 → SignalWire → Phone
49+
- **Latency**: Typically 150-300ms end-to-end
50+
- **Quality**: Depends on codec choice (G.711 vs PCM16)
51+
3652
The key architectural components involved are:
3753
3854
- **cXML server:** Our Fastify server serves dynamic cXML to the SignalWire platform.
@@ -58,13 +74,6 @@ flowchart TD
5874
5975
*/}
6076

61-
Wondering why this guide uses cXML to stream to OpenAI, instead of using
62-
the [native SWML AI integration](/swml/methods/ai)?
63-
Since OpenAI's Realtime API is built for Speech-to-Speech (or "Voice-to-Voice") models,
64-
the SignalWire platform must stream audio directly to and from OpenAI
65-
instead of handling the STT, TTS, and LLM aspects with our integrated toolchain.
66-
This guide showcases the flexibility of the SignalWire platform to integrate with emerging unified audio models.
67-
6877
## Prerequisites
6978

7079
Before you begin, ensure you have:
@@ -88,8 +97,8 @@ Before you begin, ensure you have:
8897
Clone the SignalWire Solutions repository, navigate to this example, and install.
8998

9099
```bash
91-
git clone https://github.com/signalwire/solutions-architecture
92-
cd code/cxml-realtime-agent-stream
100+
git clone https://github.com/signalwire/cXML-realtime-agent-stream
101+
cd cxml-realtime-agent-stream
93102
npm install
94103
```
95104

@@ -98,11 +107,11 @@ npm install
98107
<div class="col col--4">
99108

100109
<Card
101-
title="GitHub repository"
102-
href="https://github.com/signalwire/solutions-architecture"
110+
title="Project repository"
111+
href="https://github.com/signalwire/cXML-realtime-agent-stream"
103112
icon={<MdCode />}
104113
>
105-
The SignalWire Solutions repository
114+
View the source code on GitHub
106115
</Card>
107116

108117
</div>
@@ -111,7 +120,7 @@ The SignalWire Solutions repository
111120

112121
### Add OpenAI credentials
113122

114-
Select **Local** or **Docker**
123+
Select the **Local** or **Docker** tab below depending on where you plan to run the application.
115124

116125
<Tabs groupId="deploy">
117126
<TabItem value="local" label="Local">
@@ -157,7 +166,7 @@ npm start
157166

158167
</TabItem>
159168

160-
<TabItem value="prod" label="Docker">
169+
<TabItem value="docker" label="Docker">
161170

162171
```bash
163172
docker-compose up --build signalwire-assistant
@@ -202,7 +211,7 @@ Select the **Local** tab below if you ran the application locally, and the **Doc
202211
</div>
203212

204213
<Tabs>
205-
<TabItem value="dev" label="Local">
214+
<TabItem value="local" label="Local">
206215
Use ngrok to expose port 5050 on your development machine:
207216

208217
```bash
@@ -212,7 +221,7 @@ ngrok http 5050
212221
Append `/incoming-call` to the HTTPS URL returned by ngrok.
213222
https://abc123.ngrok.io/incoming-call
214223
</TabItem>
215-
<TabItem value="prod" label="Docker">
224+
<TabItem value="docker" label="Docker">
216225
For production environments, set your server URL + `/incoming-call`:
217226
```
218227
https://your-domain.com/incoming-call
@@ -227,7 +236,7 @@ For this example, you **must** include `/incoming-call` at the end of your URL.
227236
- Give the cXML Script a descriptive name, such as "AI Voice Assistant".
228237
- Save your new Resource.
229238

230-
### Assign SIP address or phone number
239+
### Assign phone number or SIP address
231240

232241
To test your AI assistant, create a SIP address or phone number and assign it as a handler for your cXML Script Resource.
233242

@@ -887,28 +896,6 @@ All of this happens in real-time during the conversation.
887896
888897
---
889898
890-
## Audio Processing
891-
892-
### Audio Processing Pipeline
893-
894-
```mermaid
895-
graph LR
896-
A[Phone Call] --> B[SignalWire]
897-
B --> C[WebSocket]
898-
C --> D[Transport Layer]
899-
D --> E[OpenAI Realtime]
900-
E --> D
901-
D --> C
902-
C --> B
903-
B --> A
904-
```
905-
906-
**Audio Flow Details:**
907-
- **Inbound**: Phone → SignalWire → Base64 → Transport → ArrayBuffer → OpenAI
908-
- **Outbound**: OpenAI → ArrayBuffer → Transport → Base64 → SignalWire → Phone
909-
- **Latency**: Typically 150-300ms end-to-end
910-
- **Quality**: Depends on codec choice (G.711 vs PCM16)
911-
912899
### Codec Selection Guide
913900
914901
Choose the right audio codec for your use case:

0 commit comments

Comments
 (0)