Training Speech-to-Text Models in AI Trainer

Before a contact center calls’ analysis happens (sentiment analysis or intent detection), the first step is to transcribe speech to text (STT). Speech-to-text models are an integral part of many AI-powered products like Talkdesk AI Trainer or Interaction Analytics. 

Mistranscriptions happen in everyday operations of a contact center, mainly with brand and product names, or any other out-of-vocabulary, domain-specific phrases. With AI Trainer it is pretty straightforward to correct these common problems and, therefore, to improve STT model performance.


Training STT models

In order to improve the STT model performance, firstly:


1. Select the right model from AI Trainer homepage [1] (also known as “Models page”). Speech-to-text models have a chip indicating their type (Speech-to-text).

Tip: Your account can have multiple STT models configured, so your transcriptions can be adapted to multiple languages and region-specific vernaculars.

You’ll be redirected to the Vocabulary page, where you’ll see lists of all the phrases that you manage. Their goal is to correct the STT model and boost its transcription in the right direction, based on the rules you specify when adding new phrases.

Two main functionalities are available on this page:


2. New phrase [2], where you can add new phrases to Custom Vocabulary using “Sounds Like” or “International Phonetic Alphabet” (see subsections below).


3. Test vocabulary [3]. After adding your first phrases, training will start automatically. When the model is trained, you will see an option to test the improvement with your own voice.

Tip: Check if your account has any STT models configured. If this is not the case, please contact Talkdesk to request the configuration.



Adding a phrase to Vocabulary

To add a new phrase, please follow these steps:


1. Click on New phrase [1].


2. In the side-panel, add a phrase [2] you want to correct for.

Note: This is the only mandatory field. However, it is advised to use either “Sounds like” or IPA  (International Phonetic Alphabet) fields to specify how the phrase you’re correcting sounds, and to use “Display as” to provide info on expected transcription.

3. Write how the phrase sounds like (i.e., phonetically) on the “Sounds like” field [3]:

a. Add how the transcribed phrase sounds. If you use more than one word, join them with a dash. For acronyms, use dots after each letter.

b. Alternatively, add a phonetic form of the phrase, specifying more precisely the pronunciation of the phrase with open/closed sounds and duration and pauses in speech.

4. In the “Display as” field [4], write how the transcript should look like.

5. When you’re done, click Create [5].

Tip: You can undo this action anytime you want, by clicking on the Cancel button.


The option to test the improvement in AI Trainer will be unlocked as soon as training is done.





Testing Custom Vocabulary

After new phrases are added to the Custom Vocabulary, in AI Trainer, you can test if the transcription has improved.


  • With your voice [1], by recording some speech and seeing the live transcription.

Note: Reported transcriptions will be available soon.

All Articles ""
Please sign in to submit a request.