The Object Detection Template allows you to instantiate and place UI elements on the screen based on the bounding boxes of the objects of a certain class based on a Machine Learning model output.
Tip: If you already have an object detection model, you can skip down to the Importing Your Model section below. You can skip to the Customizing Your Lens Experience section if you’d like to use the example car or food detection.
Creating a Model
While the template comes with a car detection and food detection example model for the ML Component, you can make any kind of object detection by importing your own machine learning model. We’ll go through an example of what this might look like below.
Note: To learn more about Machine Learning and Lens Studio, take a look at the ML Overview page.
To create a model, you’ll need a
- Machine learning training code: code that describes how the model is trained (this is sometimes referred to as a notebook--download our example notebook).
- Data set: collection of data that our code will use to learn from (in this case we will use the COCO data set).
Tip: This dataset comes with a couple examples of classes that you can swap. Also the provided training notebook uses a generalized classes that consist of couple more specific classes in order to perform better on the particular dataset
Training Your Model
Head over to Google Colaboratory, select the
Upload tab, and drag the python notebook into the upload area.
The provided example is using the COCO dataset for training the model. Running the notebook will install all the necessary libraries and mount google drive.
Tip: You can configure your training by editing some parameters like iteration count. It also provides all available COCO dataset classes.
With our files added, you can run the code by choosing
Runtime > Run All in the menu bar. This process may take a while to run, as creating a model is computationally intensive.
Note: When using a data set to train your model, make sure that you adhere to the usage license of that dataset.
Downloading your Model
You can scroll to the Train Loop section of the notebook to see how your machine learning model is coming along. Once you are happy with the result, you can download your
Importing your Model
Now that we have our model, we’ll import it into Lens Studio.
You can drag and drop your
.onnx file into the
Resources panel to bring it into Lens Studio.
Setting up MLComponent
Tip: If you are using the built in ML models, you can skip this section.
Next, we’ll tell the template to use this model. In the
Objects panel, select
ML Component. Then, in the
Inspector panel, click on the field next to
Model, and then in the pop up window, choose your newly imported model.
Next, we’ll set up the input for the ML Component to pass in the image most similar to how our model was trained.
The model that comes with the template uses the following input settings:
- Input shape is 128 x 256 with 3 channels (for RGB) - using default settings for model here.
- DeviceCameraTexture is used as the input texture, as in we pass the camera feed to the model.
- Input Transformer Settings
- Stretch is turned off, because the detector works better if objects on the input texture preservers their original proportions.
- Horizontal and Vertical alignments are set to Center
- Rotation is set to none
- Fill color is set to black
These transform settings will take the original input texture and add paddings where needed depending on the device settings to fit the aspect of the input placeholder (in this case we have size 128 x 256 , aspect = 0.5).
As for the outputs - we will keep all the default settings since the MLController script will process the raw output data in the
Trying Example Models
Although the default ML model is set for car detection, the template also comes with a food detection model. You can swap it with the food detection model found under
Example Assets/Food Detection[TRY_SWAPPING] in the
Resources panel, by inputting the ML model into the
Model field on
Tip: If you are using the built in ML models, you can skip this section.
MLController[EDIT_ME] object contains the ML Component and the MLController script which controls the MLComponent and processes its output.
By default all you will need to do is link your ML Controller to the MLController script, but you can find more model specific parameters by ticking
Output Cls and
Output Loc need to have the same name as the ones in your ML model. This should be left as it is if you are using the provided notebook for training.
Output Loc - name of the MLComponent output that provides unprocessed detection locations.
Output Cls - name of the MLComponent output that returns scores(probability) of the detections.
Tip: The Output Loc and Output Cls should have the same name as the ones in your ML model, you can find out the output names on the ML Component.
Confidence Threshold - defines the minimum score of unprocessed detections that will be taken into account. If the detection threshold is lower--they will be skipped.
TopK - is the amount of detections with highest score to keep
Loader - is the UI element that will show up when the ML model is being loaded.
Note: Machine Learning model is used to create proposals of bounding boxes of a certain class based on the input image. This script applies a Non-maximum suppression algorithm to filter and post process those detections based on some criteria. If the Intersection over union (IOU) of detected boxes is higher than the Confidence Threshold value, they will be considered the same box.
Customizing Your Lens Experience
Object Detection Controller
The Object Detection Controller contains the
ObjectDetectionController script which takes the processed detection boxes from the
MLController script, instantiates the corresponding amount of detection boxes and controls their
Screen Transform components.
The Counter object - is the Text Component used to output the amount of objects detected at the current moment of time.
The Object To Copy - is the object to duplicate. It has to have a ScreenTransform Component. By default it is set to the
Detection Box[EDIT_CHILDREN] sceneObject.
Smoothing determines the smoothing applied to the detection box anchor positions. The higher the number, the smoother and slower detection screen transforms move. If smoothing equals
0.0 - this means no smoothing.
Hint Controller is the HintController script that controls the hint displayed when an object is not detected.
You can optionally fine tune additional settings that help smooth detection boxes positions on the screen by ticking the Advanced checkbox:
Matching Threshold sets the breakpoint ratio of the intersection of two processed detection boxes over their union, that determines whether two detection boxes should be considered as different or the same.
Lost Frame Threshold determines the amount of frames the instantiated visual element will be kept when current detection is defined as lost. A larger number of Lost Frame Threshold means it takes longer for a box to be removed after an object has been lost. Having a smaller number would provide more instantaneous updates and a larger number would create a more smoothed result.
Detection Box [EDIT_CHILDREN] is the object that is duplicated by the ObjectDetectionController script for each detected object.
Note: The Object Detection model provides us the information about the detection boxes positions in screen space. The Detection Box object is using ScreenTransform Component.
Using Screen Transform allows you to create a complex layout of 2d elements (Screen Images and Screen Text) within its Screen Transform. Anchors of the Detection Box screenTransform will be driven by a script and all the children Screen Transforms will adapt accordingly to their setup.
Tip: To see how your detection box would respond to different size objects, click on the
Detection Box [EDIT_CHILDREN] object and manipulate its anchors to see how it affects children:
Detection Box object provided in the template has several children objects :
Small Hint (swap the texture on the Image Component of this object). This image uses the
Pin To Edge and
Fix Size option so that its size and position stays constant on the screen.
Frame . The frame represents the visual surrounding the detection boxes. It is built from 8 parts - one for each edge, and one for each corner. They are using different combinations of
Pin to Edge settings.
Refer to the Screen Transform guide to set up your custom children layout.
TIP: You can also swap textures used for the frame in the resources. In the
Resources panel, - Right click a texture and select -
relink to new source. You can also modify the size of each frame by changing the Padding setting of its ScreenTransform Component.
To customize the hint shown when an object is not detected select
Hint [EDIT_CHILDREN] object in the
HintController script has an api that allows to call functions that show and hide the
HintSceneObject from other scripts. To avoid hint from constantly popping up in case if detections are a bit noisy and can disappear for a couple frames, you can set a
MinLostFrames parameter. If there are no detections found for this amount of frames - hint will show up.
HideOnCapture checkbox if you don’t want the hint to appear on the final snap.
Hint [EDIT_CHILDREN] objects hierarchy to modify what the hint displays.
Big Hint [EDIT_ME] object to change the image displayed. Swap the
Texture parameter of the Image component of the
Big Hint [EDIT_ME] object in the
Similarly, change the text of the TextComponent of the
Hint Text [EDIT_ME] object on the
Previewing Your Lens
You’re now ready to preview your Lens! To preview your Lens in Snapchat, follow the Pairing to Snapchat guide.
Please refer to the guides below for additional information:
Still Looking for help?Visit Support