@@ -25,14 +25,30 @@ In this guide, we will build a Node.js application that serves a
2525[ cXML Script] [ cxml ]
2626that initiates a two-way (bidirectional)
2727[ ` <Stream> ` ] [ bidir-stream ]
28- to the OpenAI Realtime API.
29- When a caller initiates a SIP or
30- <Tooltips tip = " Public Switched Telephone Network" >PSTN</Tooltips >
31- call to the assigned phone number,
32- the SignalWire platform requests and runs the script.
28+ to a Speech-to-Speech model on the OpenAI Realtime API.
29+ When a caller initiates a call to the assigned phone number,
30+ the SignalWire platform requests and runs the cXML script.
31+
32+ ``` mermaid
33+ graph LR
34+ A[Phone call] --> B[SignalWire]
35+ B --> C[WebSocket]
36+ C --> D[Transport layer]
37+ D --> E[OpenAI Realtime]
38+ E --> D
39+ D --> C
40+ C --> B
41+ B --> A
42+ ```
3343
3444{ /* This architectural explainer is a DRAFT. It could be useful, but needs further refinement.
3545
46+ **Audio Flow Details:**
47+ - **Inbound**: Phone → SignalWire → Base64 → Transport → ArrayBuffer → OpenAI
48+ - **Outbound**: OpenAI → ArrayBuffer → Transport → Base64 → SignalWire → Phone
49+ - **Latency**: Typically 150-300ms end-to-end
50+ - **Quality**: Depends on codec choice (G.711 vs PCM16)
51+
3652The key architectural components involved are:
3753
3854- **cXML server:** Our Fastify server serves dynamic cXML to the SignalWire platform.
@@ -58,13 +74,6 @@ flowchart TD
5874
5975*/ }
6076
61- Wondering why this guide uses cXML to stream to OpenAI, instead of using
62- the [ native SWML AI integration] ( /swml/methods/ai ) ?
63- Since OpenAI's Realtime API is built for Speech-to-Speech (or "Voice-to-Voice") models,
64- the SignalWire platform must stream audio directly to and from OpenAI
65- instead of handling the STT, TTS, and LLM aspects with our integrated toolchain.
66- This guide showcases the flexibility of the SignalWire platform to integrate with emerging unified audio models.
67-
6877## Prerequisites
6978
7079Before you begin, ensure you have:
@@ -88,8 +97,8 @@ Before you begin, ensure you have:
8897Clone the SignalWire Solutions repository, navigate to this example, and install.
8998
9099``` bash
91- git clone https://github.com/signalwire/solutions-architecture
92- cd code/ cxml-realtime-agent-stream
100+ git clone https://github.com/signalwire/cXML-realtime-agent-stream
101+ cd cxml-realtime-agent-stream
93102npm install
94103```
95104
@@ -98,11 +107,11 @@ npm install
98107<div class = " col col--4" >
99108
100109<Card
101- title = " GitHub repository"
102- href = " https://github.com/signalwire/solutions-architecture "
110+ title = " Project repository"
111+ href = " https://github.com/signalwire/cXML-realtime-agent-stream "
103112 icon = { <MdCode />}
104113 >
105- The SignalWire Solutions repository
114+ View the source code on GitHub
106115</Card >
107116
108117</div >
@@ -111,7 +120,7 @@ The SignalWire Solutions repository
111120
112121### Add OpenAI credentials
113122
114- Select ** Local** or ** Docker**
123+ Select the ** Local** or ** Docker** tab below depending on where you plan to run the application.
115124
116125<Tabs groupId = " deploy" >
117126<TabItem value = " local" label = " Local" >
@@ -157,7 +166,7 @@ npm start
157166
158167</TabItem >
159168
160- <TabItem value = " prod " label = " Docker" >
169+ <TabItem value = " docker " label = " Docker" >
161170
162171``` bash
163172docker-compose up --build signalwire-assistant
@@ -202,7 +211,7 @@ Select the **Local** tab below if you ran the application locally, and the **Doc
202211</div >
203212
204213<Tabs >
205- <TabItem value = " dev " label = " Local" >
214+ <TabItem value = " local " label = " Local" >
206215Use ngrok to expose port 5050 on your development machine:
207216
208217``` bash
@@ -212,7 +221,7 @@ ngrok http 5050
212221Append ` /incoming-call ` to the HTTPS URL returned by ngrok.
213222https://abc123.ngrok.io/incoming-call
214223</TabItem >
215- <TabItem value = " prod " label = " Docker" >
224+ <TabItem value = " docker " label = " Docker" >
216225For production environments, set your server URL + ` /incoming-call ` :
217226 ```
218227 https://your-domain.com/incoming-call
@@ -227,7 +236,7 @@ For this example, you **must** include `/incoming-call` at the end of your URL.
227236- Give the cXML Script a descriptive name, such as "AI Voice Assistant".
228237- Save your new Resource.
229238
230- ### Assign SIP address or phone number
239+ ### Assign phone number or SIP address
231240
232241To test your AI assistant, create a SIP address or phone number and assign it as a handler for your cXML Script Resource.
233242
@@ -887,28 +896,6 @@ All of this happens in real-time during the conversation.
887896
888897---
889898
890- ## Audio Processing
891-
892- ### Audio Processing Pipeline
893-
894- ` ` ` mermaid
895- graph LR
896- A [Phone Call] --> B [SignalWire]
897- B --> C [WebSocket ]
898- C --> D [Transport Layer]
899- D --> E [OpenAI Realtime]
900- E --> D
901- D --> C
902- C --> B
903- B --> A
904- ` ` `
905-
906- **Audio Flow Details:**
907- - **Inbound**: Phone → SignalWire → Base64 → Transport → ArrayBuffer → OpenAI
908- - **Outbound**: OpenAI → ArrayBuffer → Transport → Base64 → SignalWire → Phone
909- - **Latency**: Typically 150-300ms end-to-end
910- - **Quality**: Depends on codec choice (G.711 vs PCM16)
911-
912899### Codec Selection Guide
913900
914901Choose the right audio codec for your use case:
0 commit comments