The Speech Recognition template demonstrates how you can use the Speech Recognition to incorporate transcription and keyword detection as well as voice navigation command detection based on basic natural language understanding into the Lenses. The template contains several helpers that you can use to create voice experiences without scripting.
Note: To get to know more about Speech Recognition, please check out Speech Recognition Guide to see more detailed explanations about the concepts and scripting. To learn more about the voice ui, please check out Voice UI Template.
The template has two different examples shows how to use Speech Recognition:
- Speech Transcription Example - transcribes speech and returns live and final transcription results.
- Keyword Detection Example - allows users to define a list of keywords and detects based on Keyword Classifier keywords on top of transcription.
- Transcription is now available only for the English language, its limitations include for example new names for things, slang words or acute accents.
- Do not play sound or speech from the Lens while activating the microphone to capture sound.
- Try to avoid background noise and far device distance while activating the microphone to capture sound.
- If the microphone is muted for more than two minutes, the transcription won't continue after unmuting, and you’ll need to reset the
Previewpanel to enable it.
- If you were previously logged on to MyLenses and are having trouble seeing the preview in Lens Studio, logout from MyLenses and login to MyLenses again.
Here is how to logout and login to MyLenses.
When we open the template, we can find them in the
The main asset used for Speech Recognition is
VoiceML Module. We can find it in the
Resources Panel. We attach it to the scripts in each example to configure settings for Speech Recognition.
Audio From Microphone
We are using
Audio From Microphone in the
Resources panel for voice input in Lens Studio. We attach it to the scripts in each example to enable voice Input in Lens Studio.
Audio From Microphone in the
Preview panel, you will need to provide access to your microphone. In the bottom of the
Preview panel, click the microphone. Test with your voice to see the blue vertical volume meter in action to ensure you are not muted. Then try speaking anything to see the transcription and scene object react to the voice events.
Now let’s take a look at the transcription example. In this example, we can transcribe speech and render live and final transcription results with the Screen Text Objects. Also, we will have a voice reactive 2D Listening Animation as an example to show how to trigger visuals with Voice Events.
If the voice event -
On Listening Enabled is successfully called. We can see the Listening Icon pop up. Now try to speak to the microphone. The icon will animate when in listening mode. Animation will pause when getting the final transcription results. And turn red when getting an error.
Click on Transcription Example in the
Objects panel. In the
Inspector panel, we can then see here we are using
SpeechRecognition.js in the Script Component. This is the main script we are going to use for all the examples. Now let’s go through the details.
Notice here for the first section, we attach the
VoiceML Module and
Audio From Microphone to the
Speech Recognition Script Component for Speech Recognition configuration and voice input in Lens Studio.
Basic Setting for Transcription
Now let’s go through some basic settings for transcription in the next section.
Is Transcription: is transcription enabled
Is Live Transcription: we can enable this setting to get an additional, live and slightly less accurate transcription before we get the final, more accurate transcription.
Try to enable or disable the settings to see the difference.
Here you can set the transcription result to a Screen Text with Transcription Text enabled.
In this example, the Screen Text Object is under
Orthographic Camera->Transcription Example UI->Transcription Text.
We can also add speech contexts to the transcription and boost some of the words for specific transcription scenarios. Use this when transcribing words which are rarer and aren’t picked up well enough by Snap, the higher the boost value will be, the more likely the word to appear in transcription.
useSpeechContext setting enabled, we can then attach the Speech Contexts object to it.
Objects panel, click on the
Transcription Example -> Speech Contexts object. In the
Inspector panel, we can see here we have a
Speech Context Script Component attached to the object!
Add New Phrase to Speech Context
Speech Context Script Component, we can add words to the phrases and set a boost value for the phrases. To add a new word, click on the
Add Value field and input a new word you want to add.
Tip: Notice here the phrases should be made of lowercase a-z letters. The phrases should be within the vocabulary.
When an OOV(out of vocabulary) phrase is added to the Speech Context, the Voice Event -
On Error Triggered will be triggered. We will see the error message in the Logger. Here we take a random word “
az zj” as an example.
Try this by reseting the Lens in the Preview Panel, then speaking with the Microphone button enabled. We can then see the error message in the Logger.
Add New Speech Context
Or we can add a new
Speech Context Script Component with a different boost value.
Tip: The range for boost value is from 1-10, we recommend you’ll start with 5 and adjust if needed (the higher the value is, the more likely the word will appear in transcription)
We will skip the Use Keyword and Use Command sections for now. Let's take a look at
Edit Voice Event Callback. Notice here if we enable
Edit Voice Event Callback, we can then attach Behavior Scripts to different voice events and trigger different visuals. Here we have 5 different voice events.
On Listening Enabled: Trigger when the microphone is enabled.
On Listening Disabled: Trigger when the microphone is disabled.
On Listening Triggered: Trigger when changed back to listening mode.
On Error Triggered: Trigger when there is error in the transcription.
On Final Transcription Triggered: Trigger when it's a full result, or partial transcription
Tip: In Lens Studio Preview the microphone button is not simulated on the Screen. When we reset the preview, once the Speech Recognition is initialized,
On Listening Enabled will be triggered automatically. To learn how to use the microphone button, try to preview the Lens in Snapchat! Please check the details in Previewing Your Lens.
Debug Message for Voice Events
Notice here if we enable the
Debug in Voice Event Callbacks. In the Logger, we can then see the Voice Event being printed out when it gets triggered.
Send Behavior Triggers with Voice Events
For each Voice Event Callback, we can assign multiple Behavior Scripts. Try to click on the
Add Value field to attach new Behavior Scripts.
Here in this example, we are attaching different behavior scripts to change the visuals for the listening icon and screen texts.
Objects panel, we have a
Transcription Example->Behaviors object. It has all the behavior scripts used in this example. Click on each object. In the
Inspector panel, we can then see the details of the Behavior Script.
Take the On Listening Enabled Behavior Object as an example, we are using Behavior Script to enable Listening Icon.
Tip: To learn more about Behavior, check out the Behavior Template!
Keyword Detection Example
Now that you have learned how to use the basic transcription. Let’s take a look at the second example, Keyword Detection. Here we can trigger behaviors based on different keywords detected from the transcription!
Let’s disable the first example, enable the Keyword Detection Example, and Reset Preview!
If the voice event -
On Listening Enabled is successfully called. We can see the Listening Icon pop up. Now try to speak to the microphone. With this example we can say “hungry/starve/I am hungry” to enable the food objects. Say “breakfast” / “I had breakfast” / “I went out for breakfast” to trigger a VFX effect with any food texture. Or “soup”/“dip” to trigger a VFX effect with soup texture, “cookie”/“I eat cookie this morning” to trigger a VFX effect with cookie texture etc.
Now click on the Keyword Detection Example. In the
Inspector panel, we can see here we continue to use the
Speech Recognition Script Component. Since the keyword detection is based on the transcription, we can see here we enable the
Is Transcription setting for transcription.
Tip: We can also enable the
Is Live Transcription and
Transcription Text as needed!
Is Live Transcription enabled, we will get a faster response. Without
Is Live Transcription enabled, we can only get the keyword response when it is the final transcription with a higher accuracy.
Click on the
Keyword Detection Example->SpeechContexts. Notice here we are using a different list of Speech Contexts. Boost words like “hungry”/“starve”/“soup”/“cookie” etc. which will be used in Keyword Detection!
Now goes back to the
Speech Recognition Script Component. Here we can see
Use Keyword is enabled for keyword detection!
Keywords Parent Object
Notice here we need to attach a scene object as the
Keywords Parent Object - Keyword Detection Example -> Keywords Object.
Keywords Object has a list of children. Each of them has a
Keyword Script Component. Click on each child object. In the
Inspector panel, we can see the details of the
Keyword Script Component. With
Keyword Script Component we can then configure basic settings for Keyword.
Tip: Notice here any enabled objects under Keywords Object with
Keyword Script Component attached will be added as a keyword.
Here let’s take the keyword “Hungry” as an example. We can define the keyword, “Hungry”. Then we need to define a list of aliases for “Hungry”. Here we have “hungry”, “starve” and “I am hungry”. If these aliases are in the transcription results, the keyword “Hungry” will be detected. Aliases give us the ability to expand the subset of words that should return Hungry if needed to serve a specific Lens experience.
Note: When using keyword detection, Snap engine will try to try to mitigate small transcription errors such as plurals instead of singular or similar sounding words (ship/sheep etc), instead, use multiple keywords to think about how different users might say the same thing in different ways like “cinema”, “movies”, “film”.
Tip: Notice here we can use more than one word - e.g. - short phrases or sentences for aliases.
Add New Alias
To add a new alias, click on the
Add Value field.
Add New Keyword
To add a new keyword, in the
Objects panel, duplicate any keyword object under Keywords and modify the settings. Or add a new scene object under Keywords Object and attach
Keyword.js to the Script Component.
Send Behavior Triggers with Keyword Detection
In the next section, with
Send Triggers enabled, we can then attach multiple Behavior Scripts to the
On Keyword Triggered. If a keyword is detected, all the Behaviors attached to the Keyword will be triggered. To attach a new Behavior Script, click on the
Add Value field.
Here let’s take the keyword “Hungry” as an example. When “Hungry” is detected, we will trigger Set Object Scale Behavior Script. In the
Objects panel, click on the
Keyword Detection Example->Behaviors->Set Object Scale Behavior Object.
Behavior and Tween
Inspector panel, let take a look at the first Script Component, we are using Behavior Script to trigger tween animation
Keyword Detection Example->Head Binding->BitmojiFood->Second Script Component.
TweenColor Script Component, we can scale up the food 3D model.
Error Code for Keyword Responses
There are few error codes which NLP models (either keyword or command detection) might return:
#SNAP_ERROR_INCONCLUSIVE: if two or more keyword categories
#SNAP_ERROR_INDECISIVE: if no keyword detected
#SNAP_ERROR_NONVERBAL: if we don’t think the audio input was really a human talking
#SNAP_ERROR_SILENCE: if too long silence
- Anything starting with
#SNAP_ERROR_: Errors that are not currently defined in this document and should be ignored
Tip: We can only detect one keyword each utterance (continuous speech until pause).
Now try resetting the Preview panel and with the Microphone button enabled in the
Preview panel, try to say the words which are not from the keyword list. We can then see the keyword error messages in the Logger.
Keyword Screen Text
After we set up all the keywords, now let’s get back to the
Speech Recognition Script Component. With Keyword Text enabled, we will update the Screen Text Component under
Orthographic Camera -> Keyword Detection UI -> NLP Keyword Text Object.
Voice Event - On Keyword Detected
In the example, when the keyword is detected, you might notice here we are also triggering two tween animations:
Orthographic Camera-> Keyword Detection UI - Safe Render Region -> NLP Keyword Text Object -> First Script Component.
Orthographic Camera-> Keyword Detection UI - Safe Render Region -> NLP Keyword Text Object -> Second Script Component.
- And we also pause the Listening Icon animation.
Here we are using a new Voice Event -
On Keyword Detected. Let’s go back to the main
Speech Recognition Script Component under Keyword Detection Example Object.
Notice here, with
Use Keyword and
Edit Voice Event Callback enabled, a new field will be added to the Voice Event Callbacks -
On Keyword Detected. Same with other Voice Events, we can add a list of Behavior Scripts, which will be triggered when any keyword is detected.
Previewing Your Lens
You’re now ready to preview your Lens! To preview your Lens in Snapchat, follow the Pairing to Snapchat guide.
Once the Lens is pushed to Snapchat, you will see the microphone button on the left top of the navigation bar. Press the button to start VoiceML and the
OnListeningEnabled will be triggered. Press the button again to stop VoiceML and the
OnListeningDisabled will be triggered.
Please refer to the guides below for additional information:
Still Looking for help?Visit Support