Note: The "Connect to Autopilot Voice" component has additional usage costs.
Talkdesk Conversation Orchestrator enables real-time audio streaming between Talkdesk and third-party solutions.
This product enables sharing the audio of an inbound call with an external system that can be connected to a third party like Autopilot or another system. This allows you to “Bring Your Own Bot” (BYOB) into Talkdesk using the Connect to Autopilot Voice Studio component to stream the audio in a bidirectional way. By choosing the “External Voice Stream” option on the component, the audio of the call is streamed to a third-party WebSocket, which includes orchestration messages and contextual information about the voice interaction.
There are separate types of events that occur during the stream's life cycle. These events are represented via WebSocket Messages:
- Connected: The first message sent, once a WebSocket connection is established.
- Start: The message containing necessary metadata about the conversation stream that is sent immediately after the “Connected” message. It is only sent once, at the beginning of the conversation stream.
- Media: The message that encapsulates the raw audio data.
- Stop: A stop message is sent when the conversation stream is either stopped or the call has ended.
When doing a bi-directional audio stream, you can send the following WebSocket messages from your system to Talkdesk:
- Media: A message in which you can encapsulate the raw audio data.
- Mark: A message that you can send after the Media message to get notified when the audio stream sent was completed.
- Stop: A message indicating that the audio stream should stop and what is the result of the operation.
- Clear: A message interrupting the audio streams of all Media messages.
The following diagram shows how both unidirectional and bidirectional audio stream work over WebSocket messages:
To understand the structure of these WebSocket messages, please review the WebSocket Messages protocols section below.
The “Connect to Autopilot Voice” component can be added at any step of the Studio flow and can be configured as follows.
By leveraging the “stream and hold” capacity, the stream will start when the call goes through the component, and stop only when you send a “Stop” message informing the success, failure, or the need for the call to be escalated to an agent.
1. Add a Connect to Autopilot Voice component to an exit of any studio component. In this case, we will add it to the exit on the "Initial step" component for demonstration purposes (exit “OK”).
2. On the “Connect Autopilot” component, select “External Voice Stream” and configure the “Voice stream URL” to the WSS connection you want the audio to be streamed to and from. The selection of “External Voice Stream” activates the Conversation Orchestrator that starts triggering different events represented via WebSocket messages.
3. If you need the call to be escalated to a live agent, then configure the “Escalation” exit and add an Assignment and Dial component step.
4. Set up the “Connect Autopilot” exits, depending on the flow you would like to define. In this example, the “Execution Error” exit was also configured to the “Assignment and Dial” so, in case of an error, an agent helps the caller.
WebSocket Messages Protocols
Each message sent is a JSON string. You can determine which type of event is occurring by using the event property of every JSON object.
Messages sent from Talkdesk to the Partner
Connected Message
"Connected" is the first message sent once a WebSocket connection is established.
Parameter |
Description |
event |
The value of connected |
protocol |
Defines the protocol for the WebSocket connections lifetime. eg: "Call" |
version |
Semantic version of the protocol. |
Example:
{
"event": "connected",
"protocol": "Call",
"version": "1.0.0"
}
Start Message
The "Start" message contains important metadata about the conversation stream and is sent immediately after the "Connected" message. It is only sent once, at the start of the conversation stream.
Parameter |
Description |
event |
The value of start |
sequenceNumber |
The number used to keep track of message-sending order. The first message starts with "1" and is then incremented. |
start |
An object containing stream metadata. |
start.streamSid |
The unique identifier of the stream. |
start.accountSid |
The account identifier that created the stream. |
start.callSid |
The call identifier from where the stream was started. |
start.tracks |
An array of values that indicates what media flows to expect in subsequent messages. Values include inbound, and outbound. |
start.customParameters |
An object that represents the Custom Parameters set when defining the Stream (explained below). |
start.mediaFormat |
An object containing the format of the payload in the media messages. |
start.mediaFormat.encoding |
The encoding of the data in the upcoming payload. The value will always be audio/x-mulaw. |
start.mediaFormat.sampleRate |
The sample rate in Hertz of the upcoming audio data. The value is always 8000. |
start.mediaFormat.channels |
The number of channels in the input audio data. The value will always be 1. |
streamSid |
The unique identifier of the stream. |
Example:
{
"event": "start",
"sequenceNumber": "2",
"start": {
"streamSid": "MZ18ad3ab5a668481ce02b83e7395059f0",
"accountSid": "AC123",
"callSid": "CA123",
"tracks": [
"inbound",
"outbound"
],
"customParameters": {
"extra_parameters":{
"initial_timestamp":"1668428901027","flow_id":""},
"account_id": "5cee471c844dda000d67a428",
"interaction_id": "073c9e0e1ab44c8a8085da2b08c1ecf9",
"stream_url": "wss://my.service.com/socket/messages",
"correlation_id": "614c537a2021746aead25356",
“business_hours”:“”,
"type": "inbound",
"initial_timestamp": "2022-01-16T16:12:47.254Z",
“flow_id”: “”
}
"mediaFormat": {
"encoding": "audio/x-mulaw",
"sampleRate": 8000,
"channels": 1
}
},
"streamSid": "MZ18ad3ab5a668481ce02b83e7395059f0"
}
To enrich the start message produced by Talkdesk Global Communications Network (GCN), we'll add the following information to the "customParameters" section of the payload:
- extra_parameters: deprecated field.
- account_id: The Talkdesk ID for the account.
- Interaction_id: The unique ID of that Talkdesk interaction.
- stream_url: The WebSocket URL where the audio is being streamed.
- correlation_id: The ID that identifies the call throughout its lifetime, for all our corresponding interaction_id of that call.
- business_hours: For unidirectional audio stream this field will be empty. This is used for bidirectional streams to indicate the agent's business hours information.
- type: The media flow, that is either inbound or outbound.
- initial_timestamp: The timestamp of the moment the stream started.
- flow_id: deprecated field.
Example:
"customParameters": {
"extra_parameters":{
"initial_timestamp":"1668428901027","flow_id":""},
"account_id": "5cee471c844dda000d67a428",
"interaction_id": "073c9e0e1ab44c8a8085da2b08c1ecf9",
"stream_url": "wss://my.service.com/socket/messages",
"correlation_id": "614c537a2021746aead25356",
“business_hours”:“”,
"type": "inbound",
"initial_timestamp": "2022-01-16T16:12:47.254Z",
“flow_id”: “”
}
Media Message
The "Media" message encapsulates the raw audio data.
Parameter |
Description |
event |
The value of media. |
sequenceNumber |
The number used to keep track of the message-sending order. The first message starts with "1" and then is incremented for each message. |
media |
An object containing media metadata and payload. |
media.track |
One of inbound or outbound. |
media.chunk |
The chunk for the message. The first message will begin with "1" and increment with each subsequent message. |
media.timestamp |
Presentation Timestamp in Milliseconds from the start of the stream. |
media.payload |
Raw audio is encoded in base64 |
streamSid |
The unique identifier of the Stream |
Example:
{ |
Mark Message
When you want to be notified that the audio you have streamed has been completed, send a “Mark” message after the “Media” message.
You will receive a “Mark” event with the matching name from Talkdesk when the audio ends or if there is no buffered audio.
In case the “Clear” message was used, you will also receive a “Mark” event when the buffer clears.
Parameter |
Description |
event |
The value of mark |
sequenceNumber |
Number used to keep track of message sending order. The first message starts with "1" and then is incremented for each message. |
mark |
An object containing the mark metadata |
mark.name |
The value specified when creating the mark message |
Example:
{
"event": "mark",
"sequenceNumber": "4",
"streamSid": "MZ18ad3ab5a668481ce02b83e7395059f0",
"mark": {
"name": "my label"
}
}
Stop Message
The "Stop" message will be sent when the conversation stream is either stopped or the call has ended.
Parameter |
Description |
event |
The value of stop |
sequenceNumber |
Number used to keep track of message sending order. The first message starts with "1" and then is incremented for each message. |
stop |
An object containing Stream metadata |
stop.accountSid |
The Account identifier that created the Stream |
stop.callSid |
The Call identifier that started the Stream |
streamSid |
The unique identifier of the Stream |
Example:
{
"event": "stop",
"sequenceNumber": "5",
"stop": {
"accountSid": "AC123",
"callSid": "CA123"
},
"streamSid": "MZ18ad3ab5a668481ce02b83e7395059f0"
}
Messages a Partner can send to Talkdesk
Media Message
You need to use the “Media” message to send an audio stream from your system to Talkdesk.
The media messages will be buffered and played in the order received. To interrupt the buffered audio, you need to send a “Clear” message.
Parameter |
Description |
event |
The value of media |
streamSid |
The SID of the Stream that should play back the audio |
media |
An object containing media metadata and payload |
media.payload |
Raw mulaw/8000 audio is encoded in base64 |
Example:
{
"event": "media",
"streamSid": "MZ18ad3ab5a668481ce02b83e7395059f0",
"media": {
"payload": "a3242sadfasfa423242... (a base64 encoded string of 8000/mulaw)"
}
}
Mark Message
When you want to be notified that the audio you have streamed has been completed, send a “Mark” message after the “Media” message.
You will receive a “Mark” event with the matching name from Talkdesk when the audio ends or if there is no buffered audio.
In case the “Clear” message was used, you will also receive a “Mark” event when the buffer clears.
Parameter |
Description |
event |
The value of mark |
streamSid |
The SID of the Stream that should receive the mark |
mark |
An object containing mark metadata and payload |
mark.name |
A name specific to your needs that will assist in recognizing future received mark event |
Example:
{
"event": "mark",
"streamSid": "MZ18ad3ab5a668481ce02b83e7395059f0",
"mark": {
"name": "my label"
}
}
Clear Message
To interrupt the audio stream, send a “Clear” message. This will cancel all “Media” messages, that are buffered and have not been played.
This will empty all buffered audio and cause a “Mark” event to be sent back to you.
Parameter |
Description |
event |
The value of clear |
streamSid |
The SID of the Stream that should receive the mark |
Example:
{
"event": "clear",
"streamSid": "MZ18ad3ab5a668481ce02b83e7395059f0",
}
Stop Message
Send a “Stop” message if you want to stop the audio stream and communicate the result of the operation:
- ok - The audio stream was successful.
- error - There was an error during the audio stream.
- escalate - The audio stream should stop and the call should be escalated to a live agent.
The following table shows the parameters that you should send in a “Stop” found in this message:
Parameter |
Description |
event |
The type of event. |
stop |
An object containing stop metadata and payload information. |
stop.command |
One of the "ok"/"error"/"escalate" options. Depending on the command, a different exit option is followed in the “Connect Autopilot” component. |
stop.ringGroup |
In case of escalation, this parameter indicates the ring group to which the call is redirected. |
streamSid |
The SID of the stream that should receive the stopped stream. |
Example:
{
"event": "stop",
"streamSid": "MZ18ad3ab5a668481ce02b83e7395059f0",
"stop": {
"command": "escalate",
"ringGroup": "agents"
}
}