How can we help?

Guardian Identity FAQ


Identity within Guardian

Note: Guardian Identity is Early Access; please reach out to your sales rep or Talkdesk Support for more information.


How can Talkdesk accounts use this capability? 

This feature is available as a Pay-As-You-Go (PAYG) add-on named “Guardian Identity”. 

How is the “Guardian Identity” add-on provisioned? 

Users within all account editions can send an email requesting the “Guardian Identity” add-on activation, thus unlocking the functionality.

The Customer Success Manager (CSM) receives the email and follows up on the request. When access to the add-on is granted, the Studio components are released to be leveraged into existing or new flows.

Are there any capacity limits or thresholds to be aware of?

At this point, the infrastructure is being tested to support up to 50 calls per second.

Is it possible for customers already running voice biometrics with another vendor, to migrate to our offer?

Yes! The platform is capable of repurposing an existing client’s audio corpus (recording) to recreate a client’s voice biometrics model.

How do callers start showing in the “Identity” menu entry in Guardian?

Callers start showing after the first validation is performed, i.e., when they call in after the enrollment is done. 

What does “Phone validation” mean? 

The “Phone Validation” column is part of a fraud detection capability (to be added in the following releases).


Voice Biometric Authentication


What is Voice Biometric Authentication?

Voice Biometric (VB) Authentication is the technology that enables the recognition of a person’s voice characteristics by measuring the unique features that their physical make-up (physiology) creates on sound. With it, it is possible to identify individuals, by extracting identifiers from their voices. It is also called “speaker recognition” or “voice authentication”.

How does voice biometrics work?

Voice biometric systems process audio utterances to extract certain characteristics that are speaker-specific. A statistical model – commonly called a “voiceprint” or a “voice signature” – is built from these features.

When comparing new audio, for example, while authenticating a person against their previously enrolled voiceprint, the same process is applied and a similarity measure is obtained through pattern matching, whose value will indicate a pass or a fail.

What is the difference between Text-Dependent and Text-Independent?

A text-dependent system is one where the customer is prompted to repeat the same pre-determined phrase used during enrollment. This is commonly referred to as “active” or “prompt” voice biometrics.

No specific text is expected in a text-independent mode — the customer is free to speak naturally and utter any words. This technology is typically used while passively processing conversations.

Use cases and customer experience, rather than performance or security, often drive the choice between text-dependent or text-independent technologies. 

Can I still be recognized if I have a cold, or taking some kind of medication?

Yes. Having a cold (or being medicated) does not change the natural variables in your voice. Since VB engines exploit information related to the shape of the full vocal tract (i.e., the physiological make-up of an individual), the effect of a cold (or medication) will be minimal and catered for by the normalization of the system.

A human might think that a person may sound very different from the usual, whereas a VB engine will notice that fundamental features in the voice are the same.

Can an impersonator fool a voice biometric system?

When one’s s mimicking someone else’s voice, they copy language traits that are high-level language features, whereas voice biometric systems exploit low-level features that relate to the speaker’s vocal tract.

It’s easy to copy the way a person is speaking (accent, mannerism), but impossible to alter the way speech is produced (effect of the vocal tract).

An impersonator sounds very much like another person, for a human, whereas a VB engine will notice that fundamental features in the voice are not the same.

Can a recording of a person fool a voice biometric system (aka “Replay attack”)?

Using a recording device to play back another person’s voice is known as a “Replay Attack”. Voice biometric engines can identify recording devices, using several techniques, including detecting the absence of the highest and lowest frequencies which, though not audible to humans, are detectable to VB engines. Additionally, the process of replaying distorts audio, and this is detectable by replay detection algorithms.

Another technique that can be applied is the “Identical Utterance Checking”, where the VB engine checks previous authentications for being too similar to the one that is being analyzed. The most robust method provides additional liveness checking, where a random element, such as a random digit,  must be uttered. Recording another person’s static phrase or random phrase will not be enough to pass, given the random nature of the challenge.

Thus, the chances of bad actors obtaining a recording of a person’s authentication phrase for fraudulent purposes are very small; this is not a common technique used in mass attacks, especially for targeted attacks.

Can the system distinguish between twins?

The features used by voice biometric engines strongly relate to the shape of the vocal tract. Identical twins, with the same genes but different physical development, will have diverse vocal tracts that the voice biometric systems will be able to distinguish.



What is a typical passphrase for an effective Voice Biometric authentication?

Even though, in theory, VB can work with any passphrase, in practice, a successful and effective passphrase should follow some rules.

User-friendliness is key, therefore, a pass-phrase should be natural and intuitive for the user. Otherwise, the user could get frustrated. Failing to repeat the passphrase, or repeating it with stress, would defeat the purpose.

Ideally, it could include a statement related to the use (activating an application, confirming a transaction, confirming identity, among others).

The length of the passphrase will depend on the level of security required.


  • “Please complete my transaction”.
  • “I confirm that I am the person authorized to make this transaction with <INSTITUTION NAME>”.

What are the pre-set passphrases?

The two pre-tuned passphrases available in the system, in English, are:

  • “Please authenticate me with my voice”.
  • “My password is my voice”.

We are working towards adding pre-tuned passphrases in Spanish, French, and German.

How to set custom passphrases, and what is the cost associated with it?

Setting a custom passphrase is a manual process involving the product team to allow for the model to be trained for that phrase.

The cost to train and fine-tune the model to work with custom passphrases is $25k per custom phrase.

How many repetitions of the passphrase does it require to enroll a caller?

For a successful enrollment, the number of passphrase repetitions that contacts have to state may vary between two to five times, depending on some factors (audio quality, passphrase wording, background noise, among others).

All Articles ""
Please sign in to submit a request.