Training Speech-to-Text Models in Talkdesk CXA Operations Center (formerly AI Trainer)

Before a contact center can analyze calls for intent, topics, and other AI, it must first transcribe speech to text (STT). Speech-to-text models form the foundation of AI-powered products, such as Talkdesk Copilot and Interaction Analytics.

Transcription errors often occur with brand names, unique product names, acronyms, or domain-specific jargon (Out-of-Vocabulary words). With Talkdesk CXA Operations Center™ (formerly AI Trainer), you can correct these specific recognition issues to improve overall model performance.

 

 

Factors Affecting Transcription Accuracy

While Custom Vocabulary boosts recognition for specific terms, the overall quality of the transcription relies heavily on the source audio. Please be aware that STT models are sensitive to:

  • Background Noise: Loud environments or static can significantly reduce clarity.
  • Crosstalk: Multiple voices speaking simultaneously ("speaking on top" of one another).
  • Overlapping Speech: If a custom vocabulary term is spoken while another person is talking, the model may fail to isolate and recognize it.
  • Audio Quality: Low-quality microphones or connection issues will impact results.
  • Accents & Dialects: STT models are trained on specific regional pronunciations. If a speaker's accent differs significantly from the selected Language Locale (e.g., using a US English model for a speaker with a strong British accent), the AI may struggle to match the sounds to words. Always ensure the selected locale matches the primary accent of your contact center.

 

Accessing your Custom Vocabulary Model

  1. Navigate to the CXA Operations Center homepage (Models page).
  2. Select the Custom Vocabulary STT(Speech-to-text) model [1] you wish to improve.
    • Tip: If your operation spans multiple regions, ensure you select the correct language model (e.g., en-US vs en-GB).
  3. You will be redirected to the Custom Vocabulary page [2].

 

The Vocabulary List

The Vocabulary page displays all the custom terms you have added to the model. It features the following columns:

  • Phrase: The value the AI is trained to "listen" for.
  • Display as: The value that will appear in the final transcription.

From this view, you can Edit existing entries by clicking on the row, or add new ones.

 

Adding or Editing a Phrase

To improve recognition accuracy, you "boost" specific words by adding them to the vocabulary.

  1. Click the New Entry button in the top right corner.
  2. A side panel will appear with two distinct fields. Since the UI formatting instructions have been streamlined, it is essential to follow the Formatting Guidelines below to ensure the AI interprets your input correctly.

 

1. The "Phrase" Field: 

 Essentially, you are preparing the service to recognize a specific term. This signals to the AI that a word or phrase is likely to appear that is complicated to understand properly, such as:

  • Names
  • Words or acronyms unique to a specific industry or organization
  • Geographical location

 To ensure the model recognizes the sound correctly, you must use the following formatting:

  • No Spaces: Never use spaces in this field.
  • Hyphens: Use hyphens to separate words (e.g., New-York).
  • Acronyms: Use periods to separate letters if they should be spoken individually (e.g., A.P.I.).

 

2. The "Display as" Field:

This is the visual output. You have total freedom here to use spaces, capitalization, and standard punctuation.

 

Formatting Guidelines & Best Practices

Use this reference table when adding new terms:

Scenario

"Phrase" Input

(Format: No spaces)

"Display as" Input

(Standard Format)

Why?
Multi-word Brand Talkdesk-Phone Talkdesk Phone Hyphens link the words, forcing the model to treat them as a single unique entity.
Acronyms H.O.D. HOD Periods tell the engine to pronounce letters individually ("H-O-D") rather than as a word ("Hod"). Any pronounced letters must be separated by a period.
Plural Acronyms A.B.C.-s ABCs The hyphen before the 's' is required to denote plurality clearly.
Hybrid Terms Dynamo-D.B. DynamoDB Separates the word ("Dynamo") from the acronym ("DB").
Numbers V.X.-zero-two-Q. VX02Q Do not include digits in the Phrase field. Numbers must be spelled out.
Sound-alike Phrases Contoso Contoso Improve common sound-alike phrases (“Contoso” vs “can’t do so”).
Specific Terms  Acetaminophen Acetaminophen Improve & boost terms that are often mistranscribed. 
Simple Boost Los-Angeles Los Angeles Simply typing the word (e.g., tricky brands) increases its importance.
Acronym Boost NPS NPS Increases the likelihood of recognizing the acronym (Net Promoter Score) over common sounds like “MPs”.

Notes: 

  • The "Sounds Like" and "IPA" fields have been removed to improve the STT model. The speech engine relies on the Phrase field.
  • Examples may change from language to language, and depending on multiple factors that affect transcription.

 

Saving your Changes

  • New Phrases: Click Create to add the term to the list.
  • Editing: If correcting an existing term, modify the fields and click Save to update the model.

 

Testing the Custom Vocabulary

After adding new phrases, the model will update automatically. You should verify that the changes have fixed the transcription error.

  1. Click the Test vocabulary button at the top-right corner.
  2. The modal will display the Record audio option.
  3. Click Start recording and speak a sentence containing your new term naturally.
    • Note: Test Custom Vocabulary sessions remain active for up to 30 seconds or until you click Stop Recording.

Review the Output: Check the text below to see if the "Phrase" was recognized and mapped correctly to the "Display as" format.

All Articles ""
Please sign in to submit a request.