Autopilot Agentic: Voice Settings – Knowledge Base

Talkdesk Autopilot’s Voice settings page allows you to configure the Autopilot's synthesized voice, including language detection, voice profiles, and advanced behavioral traits.

It is possible to select up to 5 languages/voices for each Autopilot that uses voice.

Note: Autopilot Agentic using digital channels (chat, SMS, etc.) has no limitation in terms of languages, as generative AI models speak the majority of the world's spoken languages.

To configure your Autopilot's voice settings, follow these steps:

1. Open the Autopilot app, select the Autopilot that you’ll be modifying, and choose the Channels tab [1].

2. Then, select the Voice channel [2] to access all settings.

1. Configure General Voice Settings

Allow barge-in

[Preview] Capture keypad input (DTMF)

Automatic language detection

2. Adjust End of Speech Sensitivity

3. Select Your Primary Voice Setting

Available Languages

Understanding Voice Types

4. Advanced Configuration

For Natural Voices

For Neural and Neural HD Voices

5. Add Secondary Languages (Language Sets)

6. Testing your Voices and Languages

1. Configure General Voice Settings

At the top of the Voice settings page, you can enable core interaction behaviors:

Allow barge-in:

If enabled, users can bypass the Autopilot response without having to wait for the entire sentence to finish. The autopilot will detect prompts in real-time and reply accordingly.

Barge-in delay:

When enabled, this setting adds a short period of time at the start of a call during which the contact cannot interrupt the autopilot. After the delay, barge-in is enabled as normal. This setting is disabled by default, and the maximum allowed is 20 seconds, long enough to accommodate extended welcome messages, including those with legal disclosures.

Notes:

It only applies to inbound calls and cannot be enabled for outbound calls.
This is useful when carrier-side echo or audio artifacts at the beginning of a call might otherwise be mistaken for caller speech, leading to unintended interruptions.

[Preview] Capture keypad input (DTMF):

Note: This is a preview feature, available on request for select customers.

Enable this so the autopilot can extract and use numbers sent by the contact person via their phone's keypad.

DTMF (Dual-Tone Multi-Frequency) is ideal for noisy environments or when the contact needs to share sensitive information that cannot be said aloud (e.g., PIN codes, account numbers, date of birth).
The setup is automatic, but we recommend instructing the contact to use the keypad and press the pound key (#) to submit all entered digits.

Example phrasing the AI Agent can use:

"You can use your keypad and press the pound key # to finalize the inclusion."

When Capture keypad input (DTMF) is enabled, the Autopilot accepts keypad input from the contact on the next call onwards.

How DTMF input is captured

While DTMF input is being captured, voice input is no longer picked up—the autopilot listens only for keypad input.
Once the first digit is entered, the contact has 10 seconds to press each subsequent digit or #. If the timeout is reached, the digits entered up to that point are submitted.
Pressing # at any time immediately submits all digits entered so far as the contact's response.

Automatic language detection:

This allows the Autopilot to automatically detect the language of incoming messages.

Notes:

Detection is limited to the languages you select (up to a maximum of 5). This ensures accurate identification and tailored responses.
We advise only using this feature when it’s necessary, as it increases latency.

2. Adjust End of Speech Sensitivity

Use the slider to determine how much time the autopilot waits for the contact to finish answering.

Lower values: The Autopilot responds faster but might cut the user's speech off during long pauses.
Higher values: The Autopilot takes longer to respond, lowering the chance of cutting the user off.

3. Select Your Primary Voice Setting

This is the default voice and language your Autopilot will use.

Language: First, choose your primary operating language from the dropdown.
Voice profile: Then, select the specific voice persona you want to use.

Available Languages

Available languages vary and will keep expanding with more languages and voices available. See more here for the current list of AI Supported Languages.

Understanding Voice Types:

When selecting a voice profile, you will see three distinct types of voices.

Natural Voices: Realistic, human-sounding voices. They provide high-quality, fluid, and natural-sounding results. These voices can be multilingual.
Neural Voices: These voices are effective, albeit slightly less expressive, compared to Natural Voices, depending on the chosen language/voice.
Neural HD Voices: High-quality voices that are highly expressive and human-like. They interpret input text and generate speech with the appropriate emotion, pace, and rhythm without the need for manual adjustments. Neural HD Voices replicate natural speech patterns, including spontaneous pauses and emphasis, making it seem that an actual person is speaking directly with you.

Note: If you plan to add secondary languages later, the voice type you select here dictates what you can select later. Natural voices can only be paired with other Natural voices, while Neural/Neural HD voices must be paired with other Neural/Neural HD voices.

4. Advanced Configuration

Depending on the type of Voice Profile you selected in Step 3, clicking Advanced configuration will reveal different tuning options:

For Natural Voices:

Speaking speed: Adjust the rate from Extra Slow to Extra Fast based on your preferred user experience and the language/voice you select.
Similarity boost: Indicates how closely the voice matches the original voice model. Higher values enhance the "likeness" and clarity of the speaker, but excessive levels may introduce digital artifacts or unnatural audio distortions.
Stability: Controls the balance between emotional range and vocal consistency. Low stability introduces human-like inflections and rhythm. High stability reduces variation for a more controlled, predictable output.

For Neural and Neural HD Voices:

Speaking speed: Adjust the rate from “Extra Slow” to “Extra Fast”.
Temperature: Controls how expressive the voice sounds. Higher values (maximum 1.0) add variety and emotion, while lower values (minimum 0.0) keep the speech steady and consistent.

5. Add Secondary Languages (Language Sets)

To make your Autopilot multilingual, click Add more languages at the bottom of the page. You can add up to 4 additional languages (for a total limit of 5 languages per Autopilot).

Adding a secondary language automatically enables Automatic language detection.
Creating Language Sets: Because certain voices can speak multiple languages natively, you can select multiple languages within a single "Language Set" and apply one voice to all of them.
If you need a language that your current voice does not support, simply create a new Language Set and assign a different voice profile to it.

When finished making the adjustments, click Save [8].

6. Testing your Voices and Languages

If you wish to listen to the new Autopilot’s voice profile, go to the Phone or Conversations app and perform an outgoing call to a number associated with your Autopilot to know how the new settings sound to your callers.

Alternatively, you can visit the AI Agents Platform, choose the agent that is connected to your specific Autopilot, and then click “Test”.

How can we help?