Conversation Orchestrator: Streaming Bidirectional Audio Stream – Knowledge Base

Note: The "Connect to Autopilot Voice" component has additional usage costs.

Talkdesk Conversation Orchestrator enables real-time audio streaming between Talkdesk and third-party solutions.

This product enables sharing the audio of an inbound call with an external system that can be connected to a third party like Autopilot or another system. This allows you to “Bring Your Own Bot” (BYOB) into Talkdesk using the Connect to Autopilot Voice Studio component to stream the audio in a bidirectional way. By choosing the “External Voice Stream” option on the component, the audio of the call is streamed to a third-party WebSocket, which includes orchestration messages and contextual information about the voice interaction.

There are separate types of events that occur during the stream's life cycle. These events are represented via WebSocket Messages:

Connected: The first message sent, once a WebSocket connection is established.
Start: The message containing necessary metadata about the conversation stream that is sent immediately after the “Connected” message. It is only sent once, at the beginning of the conversation stream.
Media: The message that encapsulates the raw audio data.
Stop: A stop message is sent when the conversation stream is either stopped or the call has ended.

When doing a bi-directional audio stream, you can send the following WebSocket messages from your system to Talkdesk:

Media: A message in which you can encapsulate the raw audio data.
Mark: A message that you can send after the Media message to get notified when the audio stream sent was completed.
Stop: A message indicating that the audio stream should stop and what is the result of the operation.
Clear: A message interrupting the audio streams of all Media messages.

The following diagram shows how both unidirectional and bidirectional audio stream work over WebSocket messages:

To understand the structure of these WebSocket messages, please review the WebSocket Messages protocols section below.

The “Connect to Autopilot Voice” component can be added at any step of the Studio flow and can be configured as follows.

By leveraging the “stream and hold” capacity, the stream will start when the call goes through the component, and stop only when you send a “Stop” message informing the success, failure, or the need for the call to be escalated to an agent.

1. Add a Connect to Autopilot Voice component to an exit of any studio component. In this case, we will add it to the exit on the "Initial step" component for demonstration purposes (exit “OK”).

2. On the “Connect Autopilot” component, select “External Voice Stream” and configure the “Voice stream URL” to the WSS connection you want the audio to be streamed to and from. The selection of “External Voice Stream” activates the Conversation Orchestrator that starts triggering different events represented via WebSocket messages.

3. If you need the call to be escalated to a live agent, then configure the “Escalation” exit and add an Assignment and Dial component step.

4. Set up the “Connect Autopilot” exits, depending on the flow you would like to define. In this example, the “Execution Error” exit was also configured to the “Assignment and Dial” so, in case of an error, an agent helps the caller.

WebSocket Messages Protocols

Each message sent is a JSON string. You can determine which type of event is occurring by using the event property of every JSON object.

Messages sent from Talkdesk to the Partner

Connected Message

"Connected" is the first message sent once a WebSocket connection is established.

Parameter	Description
event	The value of connected
protocol	Defines the protocol for the WebSocket connections lifetime. eg: "Call"
version	Semantic version of the protocol.

Example:

{ 
 "event": "connected",  
 "protocol": "Call", 
 "version": "1.0.0"
}

Start Message

The "Start" message contains important metadata about the conversation stream and is sent immediately after the "Connected" message. It is only sent once, at the start of the conversation stream.

Parameter	Description
event	The value of start
sequenceNumber	The number used to keep track of message-sending order. The first message starts with "1" and is then incremented.
start	An object containing stream metadata.
start.streamSid	The unique identifier of the stream.
start.accountSid	The account identifier that created the stream.
start.callSid	The call identifier from where the stream was started.
start.tracks	An array of values that indicates what media flows to expect in subsequent messages. Values include inbound, and outbound.
start.customParameters	An object that represents the Custom Parameters set when defining the Stream (explained below).
start.mediaFormat	An object containing the format of the payload in the media messages.
start.mediaFormat.encoding	The encoding of the data in the upcoming payload. The value will always be audio/x-mulaw.
start.mediaFormat.sampleRate	The sample rate in Hertz of the upcoming audio data. The value is always 8000.
start.mediaFormat.channels	The number of channels in the input audio data. The value will always be 1.
streamSid	The unique identifier of the stream.

Example:

{ 
 "event": "start",  
 "sequenceNumber": "2", 
 "start": { 
   "streamSid": "MZ18ad3ab5a668481ce02b83e7395059f0", 
   "accountSid": "AC123", 
   "callSid": "CA123", 
   "tracks": [ 
     "inbound", 
     "outbound" 
   ],
   "customParameters": {
     "extra_parameters":{
        "initial_timestamp":"1668428901027","flow_id":""},
     "account_id": "5cee471c844dda000d67a428",
     "interaction_id": "073c9e0e1ab44c8a8085da2b08c1ecf9",
     "stream_url": "wss://my.service.com/socket/messages",
      "correlation_id": "614c537a2021746aead25356",
     “business_hours”:“”,
     "type": "inbound",
      "initial_timestamp": "2022-01-16T16:12:47.254Z",
     “flow_id”: “”
   }
   "mediaFormat": { 
     "encoding": "audio/x-mulaw", 
     "sampleRate": 8000, 
     "channels": 1 
   } 
 },
"streamSid": "MZ18ad3ab5a668481ce02b83e7395059f0"
}

To enrich the start message produced by Talkdesk Global Communications Network (GCN), we'll add the following information to the "customParameters" section of the payload:

extra_parameters: deprecated field.
account_id: The Talkdesk ID for the account.
Interaction_id: The unique ID of that Talkdesk interaction.
stream_url: The WebSocket URL where the audio is being streamed.
correlation_id: The ID that identifies the call throughout its lifetime, for all our corresponding interaction_id of that call.
business_hours: For unidirectional audio stream this field will be empty. This is used for bidirectional streams to indicate the agent's business hours information.
type: The media flow, that is either inbound or outbound.
initial_timestamp: The timestamp of the moment the stream started.
flow_id: deprecated field.

Example:

"customParameters": {
     "extra_parameters":{
        "initial_timestamp":"1668428901027","flow_id":""},
     "account_id": "5cee471c844dda000d67a428",
     "interaction_id": "073c9e0e1ab44c8a8085da2b08c1ecf9",
     "stream_url": "wss://my.service.com/socket/messages",
      "correlation_id": "614c537a2021746aead25356",
     “business_hours”:“”,
     "type": "inbound",
      "initial_timestamp": "2022-01-16T16:12:47.254Z",
     “flow_id”: “”
}

Media Message

The "Media" message encapsulates the raw audio data.

Parameter	Description
event	The value of media.
sequenceNumber	The number used to keep track of the message-sending order. The first message starts with "1" and then is incremented for each message.
media	An object containing media metadata and payload.
media.track	One of inbound or outbound.
media.chunk	The chunk for the message. The first message will begin with "1" and increment with each subsequent message.
media.timestamp	Presentation Timestamp in Milliseconds from the start of the stream.
media.payload	Raw audio is encoded in base64
streamSid	The unique identifier of the Stream

Example:

{ 
 "event": "media",
 "sequenceNumber": "3", 
 "media": { 
   "track": "outbound", 
   "chunk": "1", 
   "timestamp": "5",
   "payload": "no+JhoaJjpzSHxAKBgYJDhtEopGKh4aIjZm7JhILBwYIDRg1qZSLh4aIjJevLBUMBwYHDBUsr5eMiIaHi5SpNRgNCAYHCxImu5mNiIaHipGiRBsOCQYGChAf0pyOiYaGiY+e/x4PCQYGCQ4cUp+QioaGiY6bxCIRCgcGCA0ZO6aSi4eGiI2YtSkUCwcGCAwXL6yVjIeGh4yVrC8XDAgGBwsUKbWYjYiGh4uSpjsZDQgGBwoRIsSbjomGhoqQn1IcDgkGBgkPHv+ej4mGhomOnNIfEAoGBgkOG0SikYqHhoiNmbsmEgsHBggNGDWplIuHhoiMl68sFQwHBgcMFSyvl4yIhoeLlKk1GA0IBgcLEia7mY2IhoeKkaJEGw4JBgYKEB/SnI6JhoaJj57/Hg8JBgYJDhxSn5CKhoaJjpvEIhEKBwYIDRk7ppKLh4aIjZi1KRQLBwYIDBcvrJWMh4aHjJWsLxcMCAYHCxQptZiNiIaHi5KmOxkNCAYHChEixJuOiYaGipCfUhwOCQYGCQ8e/56PiYaGiY6c0h8QCgYGCQ4bRKKRioeGiI2ZuyYSCwcGCA0YNamUi4eGiIyXrywVDAcGBwwVLK+XjIiGh4uUqTUYDQgGBwsSJruZjYiGh4qRokQbDgkGBgoQH9KcjomGhomPnv8eDwkGBgkOHFKfkIqGhomOm8QiEQoHBggNGTumkouHhoiNmLUpFAsHBggMFy+slYyHhoeMlawvFwwIBgcLFCm1mI2IhoeLkqY7GQ0IBgcKESLEm46JhoaKkJ9SHA4JBgYJDx7/no+JhoaJjpzSHxAKBgYJDhtEopGKh4aIjZm7JhILBwYIDRg1qZSLh4aIjJevLBUMBwYHDBUsr5eMiIaHi5SpNRgNCAYHCxImu5mNiIaHipGiRBsOCQYGChAf0pyOiYaGiY+e/x4PCQYGCQ4cUp+QioaGiY6bxCIRCgcGCA0ZO6aSi4eGiI2YtSkUCwcGCAwXL6yVjIeGh4yVrC8XDAgGBwsUKbWYjYiGh4uSpjsZDQgGBwoRIsSbjomGhoqQn1IcDgkGBgkPHv+ej4mGhomOnNIfEAoGBgkOG0SikYqHhoiNmbsmEgsHBggNGDWplIuHhoiMl68sFQwHBgcMFSyvl4yIhoeLlKk1GA0IBgcLEia7mY2IhoeKkaJEGw4JBgYKEB/SnI6JhoaJj57/Hg8JBgYJDhxSn5CKhoaJjpvEIhEKBwYIDRk7ppKLh4aIjZi1KRQLBwYIDBcvrJWMh4aHjJWsLxcMCAYHCxQptZiNiIaHi5KmOxkNCAYHChEixJuOiYaGipCfUhwOCQYGCQ8e/56PiYaGiY6c0h8QCgYGCQ4bRKKRioeGiA=="
 } ,
 "streamSid": "MZ18ad3ab5a668481ce02b83e7395059f0"
}

Mark Message

When you want to be notified that the audio you have streamed has been completed, send a “Mark” message after the “Media” message.

You will receive a “Mark” event with the matching name from Talkdesk when the audio ends or if there is no buffered audio.

In case the “Clear” message was used, you will also receive a “Mark” event when the buffer clears.

Parameter	Description
event	The value of mark
sequenceNumber	Number used to keep track of message sending order. The first message starts with "1" and then is incremented for each message.
mark	An object containing the mark metadata
mark.name	The value specified when creating the mark message

Example:

{ 
 "event": "mark",
 "sequenceNumber": "4",
 "streamSid": "MZ18ad3ab5a668481ce02b83e7395059f0",
 "mark": {
   "name": "my label"
 }
}

Stop Message

The "Stop" message will be sent when the conversation stream is either stopped or the call has ended.

Parameter	Description
event	The value of stop
sequenceNumber	Number used to keep track of message sending order. The first message starts with "1" and then is incremented for each message.
stop	An object containing Stream metadata
stop.accountSid	The Account identifier that created the Stream
stop.callSid	The Call identifier that started the Stream
streamSid	The unique identifier of the Stream

Example:

{ 
 "event": "stop",
 "sequenceNumber": "5",
 "stop": {
    "accountSid": "AC123",
    "callSid": "CA123"
  },
  "streamSid": "MZ18ad3ab5a668481ce02b83e7395059f0" 
}

Messages a Partner can send to Talkdesk

Media Message

You need to use the “Media” message to send an audio stream from your system to Talkdesk.

The media messages will be buffered and played in the order received. To interrupt the buffered audio, you need to send a “Clear” message.

Parameter	Description
event	The value of media
streamSid	The SID of the Stream that should play back the audio
media	An object containing media metadata and payload
media.payload	Raw mulaw/8000 audio is encoded in base64

Example:

{
  "event": "media",
  "streamSid": "MZ18ad3ab5a668481ce02b83e7395059f0",
  "media": {
    "payload": "a3242sadfasfa423242... (a base64 encoded string of 8000/mulaw)"
  }
}

Mark Message

When you want to be notified that the audio you have streamed has been completed, send a “Mark” message after the “Media” message.

You will receive a “Mark” event with the matching name from Talkdesk when the audio ends or if there is no buffered audio.

In case the “Clear” message was used, you will also receive a “Mark” event when the buffer clears.

Parameter	Description
event	The value of mark
streamSid	The SID of the Stream that should receive the mark
mark	An object containing mark metadata and payload
mark.name	A name specific to your needs that will assist in recognizing future received mark event

Example:

{ 
 "event": "mark",
 "streamSid": "MZ18ad3ab5a668481ce02b83e7395059f0",
 "mark": {
   "name": "my label"
 }
}

Clear Message

To interrupt the audio stream, send a “Clear” message. This will cancel all “Media” messages, that are buffered and have not been played.

This will empty all buffered audio and cause a “Mark” event to be sent back to you.

Parameter	Description
event	The value of clear
streamSid	The SID of the Stream that should receive the mark

Example:

{ 
 "event": "clear",
 "streamSid": "MZ18ad3ab5a668481ce02b83e7395059f0",
}

Stop Message

Send a “Stop” message if you want to stop the audio stream and communicate the result of the operation:

ok - The audio stream was successful.
error - There was an error during the audio stream.
escalate - The audio stream should stop and the call should be escalated to a live agent.

The following table shows the parameters that you should send in a “Stop” found in this message:

Parameter	Description
event	The type of event.
stop	An object containing stop metadata and payload information.
stop.command	One of the "ok"/"error"/"escalate" options. Depending on the command, a different exit option is followed in the “Connect Autopilot” component.
stop.ringGroup	In case of escalation, this parameter indicates the ring group to which the call is redirected.
streamSid	The SID of the stream that should receive the stopped stream.

Example:

{
  "event": "stop",
  "streamSid": "MZ18ad3ab5a668481ce02b83e7395059f0",
  "stop": {
    "command": "escalate",
    "ringGroup": "agents"
  }
}

How can we help?

Conversation Orchestrator: Streaming Bidirectional Audio Stream

Published October 10, 2022 17:24 • Last Updated June 17, 2026 14:13

WebSocket Messages Protocols

Messages sent from Talkdesk to the Partner

Connected Message

Example:

Start Message

Example:

Example:

Media Message

Example:

Mark Message

Example:

Stop Message

Example:

Messages a Partner can send to Talkdesk

Media Message

Example:

Mark Message

Example:

Clear Message

Example:

Stop Message

WebSocket Messages Protocols

Messages sent from Talkdesk to the Partner

Connected Message

Example:

Start Message

Example:

Example:

Media Message

Example:

Mark Message

Example:

Stop Message

Example:

Messages a Partner can send to Talkdesk

Media Message

Example:

Mark Message

Example:

Clear Message

Example:

Stop Message

Related articles