Conversation Orchestrator: Streaming Bidirectional Audio

Note: The "Connect to Autopilot Voice" component has additional usage costs. 

Talkdesk Conversation Orchestrator enables real-time audio streaming between Talkdesk and third-party solutions.

This product enables sharing the audio of an inbound call with an external system that can be connected to a third party like Autopilot or another system. This allows you to “Bring Your Own Bot” (BYOB) into Talkdesk using the Connect to Autopilot Voice Studio component to stream the audio in a bidirectional way. By choosing the “External Voice Stream” option on the component, the audio of the call is streamed to a third-party WebSocket, which includes orchestration messages and contextual information about the voice interaction.  

There are separate types of events that occur during the stream's life cycle. These events are represented via WebSocket Messages:

  • Connected: The first message sent, once a WebSocket connection is established.
  • Start: The message containing necessary metadata about the conversation stream that is sent immediately after the “Connected” message. It is only sent once, at the beginning of the conversation stream.
  • Media: The message that encapsulates the raw audio data.
  • Stop: A stop message is sent when the conversation stream is either stopped or the call has ended.

When doing a bi-directional audio stream, you can send the following WebSocket messages from your system to Talkdesk:

  • Media: A message in which you can encapsulate the raw audio data.
  • Mark: A message that you can send after the Media message to get notified when the audio stream sent was completed.
  • Stop: A message indicating that the audio stream should stop and what is the result of the operation.
  • Clear: A message interrupting the audio streams of all Media messages.

The following diagram shows how both unidirectional and bidirectional audio stream work over WebSocket messages:

image1.png

To understand the structure of these WebSocket messages, please review the WebSocket Messages protocols section below.

The “Connect to Autopilot Voice” component can be added at any step of the Studio flow and can be configured as follows.

By leveraging the “stream and hold” capacity, the stream will start when the call goes through the component, and stop only when you send a “Stop” message informing the success, failure, or the need for the call to be escalated to an agent. 

Connect_to_VA.png

1. Add a Connect to Autopilot Voice component to an exit of any studio component. In this case, we will add it to the exit on the "Initial step" component for demonstration purposes (exit “OK”).

2. On the “Connect Autopilot” component, select “External Voice Stream” and configure the “Voice stream URL” to the WSS connection you want the audio to be streamed to and from. The selection of “External Voice Stream” activates the Conversation Orchestrator that starts triggering different events represented via WebSocket messages.

3. If you need the call to be escalated to a live agent, then configure the “Escalation” exit and add an Assignment and Dial component step.

4. Set up the “Connect Autopilot” exits, depending on the flow you would like to define. In this example, the “Execution Error” exit was also configured to the “Assignment and Dial” so, in case of an error, an agent helps the caller. 

 

WebSocket Messages Protocols 

Each message sent is a JSON string. You can determine which type of event is occurring by using the event property of every JSON object.

 

Messages sent from Talkdesk to the Partner

Connected Message

"Connected" is the first message sent once a WebSocket connection is established.

Parameter

Description

event

The value of connected

protocol

Defines the protocol for the WebSocket connections lifetime. eg: "Call"

version

Semantic version of the protocol.

 

Example: 

{ 
"event": "connected",  
"protocol": "Call", 
"version": "1.0.0"
}

 

Start Message

The "Start" message contains important metadata about the conversation stream and is sent immediately after the "Connected" message. It is only sent once, at the start of the conversation stream.

Parameter

Description

event

The value of start

sequenceNumber

The number used to keep track of message-sending order. The first message starts with "1" and is then incremented.

start

An object containing stream metadata.

start.streamSid

The unique identifier of the stream.

start.accountSid

The account identifier that created the stream.

start.callSid

The call identifier from where the stream was started.

start.tracks

An array of values that indicates what media flows to expect in subsequent messages. Values include inbound, and outbound.

start.customParameters

An object that represents the Custom Parameters set when defining the Stream (explained below).

start.mediaFormat

An object containing the format of the payload in the media messages.

start.mediaFormat.encoding

The encoding of the data in the upcoming payload. The value will always be audio/x-mulaw.

start.mediaFormat.sampleRate

The sample rate in Hertz of the upcoming audio data. The value is always 8000.

start.mediaFormat.channels

The number of channels in the input audio data. The value will always be 1.

streamSid

The unique identifier of the stream.

Example: 

{ 
"event": "start",  
"sequenceNumber": "2", 
"start": { 
  "streamSid": "MZ18ad3ab5a668481ce02b83e7395059f0", 
  "accountSid": "AC123", 
  "callSid": "CA123", 
  "tracks": [ 
    "inbound", 
    "outbound" 
  ],
  "customParameters": {
    "extra_parameters":{
       "initial_timestamp":"1668428901027","flow_id":""},
    "account_id": "5cee471c844dda000d67a428",
    "interaction_id": "073c9e0e1ab44c8a8085da2b08c1ecf9",
    "stream_url": "wss://my.service.com/socket/messages",
     "correlation_id": "614c537a2021746aead25356",
    “business_hours”:“”,
    "type": "inbound",
     "initial_timestamp": "2022-01-16T16:12:47.254Z",
    “flow_id”: “”
  }
  "mediaFormat": { 
    "encoding": "audio/x-mulaw", 
    "sampleRate": 8000, 
    "channels": 1 
  } 
},
"streamSid": "MZ18ad3ab5a668481ce02b83e7395059f0"
}

To enrich the start message produced by Talkdesk Global Communications Network (GCN), we'll add the following information to the "customParameters" section of the payload:

  • extra_parameters: deprecated field.
  • account_id: The Talkdesk ID for the account.
  • Interaction_id: The unique ID of that Talkdesk interaction.
  • stream_url: The WebSocket URL where the audio is being streamed. 
  • correlation_id: The ID that identifies the call throughout its lifetime, for all our corresponding interaction_id of that call.
  • business_hours: For unidirectional audio stream this field will be empty. This is used for bidirectional streams to indicate the agent's business hours information.
  • type: The media flow, that is either inbound or outbound.
  • initial_timestamp: The timestamp of the moment the stream started.
  • flow_id: deprecated field.

Example: 

"customParameters": {
    "extra_parameters":{
       "initial_timestamp":"1668428901027","flow_id":""},
    "account_id": "5cee471c844dda000d67a428",
    "interaction_id": "073c9e0e1ab44c8a8085da2b08c1ecf9",
    "stream_url": "wss://my.service.com/socket/messages",
     "correlation_id": "614c537a2021746aead25356",
    “business_hours”:“”,
    "type": "inbound",
     "initial_timestamp": "2022-01-16T16:12:47.254Z",
    “flow_id”: “”
}

 

Media Message

The "Media" message encapsulates the raw audio data.

Parameter

Description

event

The value of media.

sequenceNumber

The number used to keep track of the message-sending order. The first message starts with "1" and then is incremented for each message.

media

An object containing media metadata and payload.

media.track

One of inbound or outbound.

media.chunk

The chunk for the message. The first message will begin with "1" and increment with each subsequent message.

media.timestamp

Presentation Timestamp in Milliseconds from the start of the stream.

media.payload

Raw audio is encoded in base64

streamSid

The unique identifier of the Stream

 

Example: 

{ 
"event": "media",
"sequenceNumber": "3", 
"media": { 
  "track": "outbound", 
  "chunk": "1", 
  "timestamp": "5",
  "payload": "no+JhoaJjpzSHxAKBgYJDhtEopGKh4aIjZm7JhILBwYIDRg1qZSLh4aIjJevLBUMBwYHDBUsr5eMiIaHi5SpNRgNCAYHCxImu5mNiIaHipGiRBsOCQYGChAf0pyOiYaGiY+e/x4PCQYGCQ4cUp+QioaGiY6bxCIRCgcGCA0ZO6aSi4eGiI2YtSkUCwcGCAwXL6yVjIeGh4yVrC8XDAgGBwsUKbWYjYiGh4uSpjsZDQgGBwoRIsSbjomGhoqQn1IcDgkGBgkPHv+ej4mGhomOnNIfEAoGBgkOG0SikYqHhoiNmbsmEgsHBggNGDWplIuHhoiMl68sFQwHBgcMFSyvl4yIhoeLlKk1GA0IBgcLEia7mY2IhoeKkaJEGw4JBgYKEB/SnI6JhoaJj57/Hg8JBgYJDhxSn5CKhoaJjpvEIhEKBwYIDRk7ppKLh4aIjZi1KRQLBwYIDBcvrJWMh4aHjJWsLxcMCAYHCxQptZiNiIaHi5KmOxkNCAYHChEixJuOiYaGipCfUhwOCQYGCQ8e/56PiYaGiY6c0h8QCgYGCQ4bRKKRioeGiI2ZuyYSCwcGCA0YNamUi4eGiIyXrywVDAcGBwwVLK+XjIiGh4uUqTUYDQgGBwsSJruZjYiGh4qRokQbDgkGBgoQH9KcjomGhomPnv8eDwkGBgkOHFKfkIqGhomOm8QiEQoHBggNGTumkouHhoiNmLUpFAsHBggMFy+slYyHhoeMlawvFwwIBgcLFCm1mI2IhoeLkqY7GQ0IBgcKESLEm46JhoaKkJ9SHA4JBgYJDx7/no+JhoaJjpzSHxAKBgYJDhtEopGKh4aIjZm7JhILBwYIDRg1qZSLh4aIjJevLBUMBwYHDBUsr5eMiIaHi5SpNRgNCAYHCxImu5mNiIaHipGiRBsOCQYGChAf0pyOiYaGiY+e/x4PCQYGCQ4cUp+QioaGiY6bxCIRCgcGCA0ZO6aSi4eGiI2YtSkUCwcGCAwXL6yVjIeGh4yVrC8XDAgGBwsUKbWYjYiGh4uSpjsZDQgGBwoRIsSbjomGhoqQn1IcDgkGBgkPHv+ej4mGhomOnNIfEAoGBgkOG0SikYqHhoiNmbsmEgsHBggNGDWplIuHhoiMl68sFQwHBgcMFSyvl4yIhoeLlKk1GA0IBgcLEia7mY2IhoeKkaJEGw4JBgYKEB/SnI6JhoaJj57/Hg8JBgYJDhxSn5CKhoaJjpvEIhEKBwYIDRk7ppKLh4aIjZi1KRQLBwYIDBcvrJWMh4aHjJWsLxcMCAYHCxQptZiNiIaHi5KmOxkNCAYHChEixJuOiYaGipCfUhwOCQYGCQ8e/56PiYaGiY6c0h8QCgYGCQ4bRKKRioeGiA=="
} ,
"streamSid": "MZ18ad3ab5a668481ce02b83e7395059f0"
}

Mark Message

When you want to be notified that the audio you have streamed has been completed, send a “Mark” message after the “Media” message.

You will receive a “Mark” event with the matching name from Talkdesk when the audio ends or if there is no buffered audio.

In case the “Clear” message was used, you will also receive a “Mark” event when the buffer clears. 

Parameter

Description

event

The value of mark

sequenceNumber

Number used to keep track of message sending order. The first message starts with "1" and then is incremented for each message.

mark

An object containing the mark metadata

mark.name

The value specified when creating the mark message

Example: 

{ 
"event": "mark",
"sequenceNumber": "4",
"streamSid": "MZ18ad3ab5a668481ce02b83e7395059f0",
"mark": {
  "name": "my label"
}
}


Stop Message

The "Stop" message will be sent when the conversation stream is either stopped or the call has ended.

Parameter

Description

event

The value of stop

sequenceNumber

Number used to keep track of message sending order. The first message starts with "1" and then is incremented for each message.

stop

An object containing Stream metadata

stop.accountSid

The Account identifier that created the Stream

stop.callSid

The Call identifier that started the Stream

streamSid

The unique identifier of the Stream

Example: 

{ 
"event": "stop",
"sequenceNumber": "5",
"stop": {
   "accountSid": "AC123",
   "callSid": "CA123"
 },
 "streamSid": "MZ18ad3ab5a668481ce02b83e7395059f0" 
}

Messages a Partner can send to Talkdesk 

Media Message

You need to use the “Media” message to send an audio stream from your system to Talkdesk. 

The media messages will be buffered and played in the order received. To interrupt the buffered audio, you need to send a “Clear” message.

Parameter

Description

event

The value of media

streamSid

The SID of the Stream that should play back the audio

media

An object containing media metadata and payload

media.payload

Raw mulaw/8000 audio is encoded in base64

Example: 

{
 "event": "media",
 "streamSid": "MZ18ad3ab5a668481ce02b83e7395059f0",
 "media": {
   "payload": "a3242sadfasfa423242... (a base64 encoded string of 8000/mulaw)"
 }
}


Mark Message

When you want to be notified that the audio you have streamed has been completed, send a “Mark” message after the “Media” message.

You will receive a “Mark” event with the matching name from Talkdesk when the audio ends or if there is no buffered audio.

In case the “Clear” message was used, you will also receive a “Mark” event when the buffer clears. 

Parameter

Description

event

The value of mark

streamSid

The SID of the Stream that should receive the mark

mark

An object containing mark metadata and payload

mark.name

A name specific to your needs that will assist in recognizing future received mark event

Example: 

{ 
"event": "mark",
"streamSid": "MZ18ad3ab5a668481ce02b83e7395059f0",
"mark": {
  "name": "my label"
}
}


Clear Message

To interrupt the audio stream, send a “Clear” message. This will cancel all “Media” messages, that are buffered and have not been played.

This will empty all buffered audio and cause a “Mark” event to be sent back to you.

Parameter

Description

event

The value of clear

streamSid

The SID of the Stream that should receive the mark

Example:

{ 
"event": "clear",
"streamSid": "MZ18ad3ab5a668481ce02b83e7395059f0",
}


Stop Message

Send a “Stop” message if you want to stop the audio stream and communicate the result of the operation:

  • ok - The audio stream was successful. 
  • error - There was an error during the audio stream.
  • escalate - The audio stream should stop and the call should be escalated to a live agent.

The following table shows the parameters that you should send in a “Stop” found in this message:

Parameter

Description

event

The type of event.

stop

An object containing stop metadata and payload information.

stop.command

One of the "ok"/"error"/"escalate" options. 

Depending on the command, a different exit option is followed in the “Connect Autopilot” component.

stop.ringGroup

In case of escalation, this parameter indicates the ring group to which the call is redirected.

streamSid

The SID of the stream that should receive the stopped stream.

 

Example:

{
 "event": "stop",
 "streamSid": "MZ18ad3ab5a668481ce02b83e7395059f0",
 "stop": {
   "command": "escalate",
   "ringGroup": "agents"
 }
}
All Articles ""
Please sign in to submit a request.