Choosing the right model for object detection

When working on an object detection problem, the decision to choose a model out of several available options can be a tricky one. But it does not have to be that hard. Let's see how we can figure things out and make the best decision.
Meghal Darji
Created on March 18|Last edited on March 21
Comment
﻿
IntroductionOne of the most common hurdles in an object detection problem is choosing a model that will do really well on a custom dataset. But how do you decide which model to use when there are so many models out there to choose from? Research and Development in Deep Learning has led to the availability of several models which makes selecting a model somewhat difficult and time-consuming. 
One of the ways to approach is to compare the model's performance results and metrics on standard datasets like COCO, or Pascal VOC dataset. This idea might seem right at first glance as this is how the performance of a new model is benchmarked against others. However, the performance of a model substantially depends upon the objects in the dataset. A model may not perform as well on a custom dataset as it did on some standard datasets. Hence, the best way is to try a couple of models and compare different performance metrics and see which one works best for you. That is exactly what we'll be doing today. 
ApproachToday we'll try out a couple of different models by comparing their performance on a custom dataset. One of the most important steps is to visualize the performance metrics on the go to get a good idea of what's working and what's not. We'll use Weights and Biases (WandB) to log and visualize performance metrics. Furthermore, the choice of the metric to be visualized and compared depends upon the task at hand, the objective of the project, and to a certain extent on the constraints/available resources as well.
The dataset that we'll be using today is the Road Sign Dataset from Kaggle. Let us assume that this task of detecting road signs is a part of a perception algorithm for autonomous vehicles. Which metric or which aspect of the performance would be important in this case? Which metric would you want to compare or different models to choose the best one? 
For a self-driving vehicle, it is important that it is important to recognize all the road signs and consequently act in accordance with the traffic rules and regulations. Hence, we would want our model to perform so well that it is capable of detecting all the road signs that maybe be presented to it. In other words, we want our model to have high accuracy (ideally 100% accuracy). Hence, for this task, we will use the COCO metric to compare our models. We'll also keep an eye on the training loss and validation loss.
Now that we've finalized our approach and a metric to compare models, let's get our hands dirty with the code. We'll be using IceVision which simplifies the object detection pipeline along with fast.ai to train our models and log metrics using WandB callback. So, open up your Notebooks(Jupyter).
WorkflowInstallations and imports
Data Collection
Data Parsing
Creating Data Augmentations and Transformation
Training models (and Comparing metrics)
Selecting the best fit (Model)
1. Installations and importsLet's fetch IceVision from the Github repo and install it for an appropriate target (CPU/GPU). Make sure you restart the kernel after installing IceVision. 
# Download IceVision
!wget https://raw.githubusercontent.com/airctic/icevision/master/icevision_install.sh
﻿
# Choose your installation target: cuda11 or cuda10 or cpu
!bash icevision_install.sh cuda11
Once that is taken care of, we'll import the WandB callback from Fast.ai, and everything from IceVision. If you want you can also import the SaveModelCallback() to save the best model. If you're using Google Colab, don't forget to import files as it will come in handy when fetching the dataset from Kaggle. 
# Import WandB and SaveModelCallback from Fast.ai
from fastai.callback.wandb import *
from fastai.callback.tracker import SaveModelCallback
# Import everything from IceVision
from icevision.all import *
from icevision.models import *
# Import files from Google Colab
from google.colab import files
﻿
2. Data CollectionSince we'll be using the Road Sign Dataset from Kaggle we'll need to install the Kaggle API.
!pip install kaggle
Once the Kaggle API, is installed you'll need to upload the API token which will be a .json file containing your API credentials. Upload the token to google using files.upload(). Once the API token is ready to use, follow the steps shown below:
!mkdir ~/.kaggle
!cp kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json      
The code block above moves the API token to another directory where the system expects to find it and change the file permission for the API token. Now, fetch the data set from Kaggle (Zip file), make a new directory, and extract the data in the new directory.
# Fetch the dataset
! kaggle datasets download 'andrewmvd/road-sign-detection'
# Create a new directory
!mkdir 'road_sign'
# Extract the dataset files
!unzip 'road-sign-detection.zip' -d 'road_sign'
Now that we have our data, the next step is to parse it.
3. Data ParsingIceVision makes it very easy to parse the object detection data with VOCBoxParser method from the parser class.
parser = parsers.VOCBBoxParser(annotations_dir="road_sign/annotations", images_dir="road_sign/images")
train_records, valid_records = parser.parse()
parser.class_map
﻿
4. Creating Augmentations and TransformationsIceVision leverages the Albumentations library to perform data transformations for augmentation. Albumentations' aug_tfms randomly applied useful transformations to the training images. These transformations are applied to the training set only and the validation set is only resized.
# Augmentation by tranformations
image_size = 384
# Create the transormations
train_tfms = tfms.A.Adapter([*tfms.A.aug_tfms(size=image_size, presize=512), tfms.A.Normalize()])
valid_tfms = tfms.A.Adapter([*tfms.A.resize_and_pad(image_size), tfms.A.Normalize()])
# Apply the transformations to the datasets
train_ds = Dataset(train_records, train_tfms)
valid_ds = Dataset(valid_records, valid_tfms)
﻿
#data loaders
train_dl = vf_model_type.train_dl(train_ds, batch_size=16, num_workers=4, shuffle=True)
valid_dl = vf_model_type.valid_dl(valid_ds, batch_size=16, num_workers=4, shuffle=False)
Now that our data is ready to be trained on, the next step is to select a couple of interesting and promising models to compare the performance metrics. 
5. Training modelsFor today's task, we'll test and compare the performance of 4 different promising object detection models:
EfficientDet
YoloV5
Detectron 
VFNet
The first step is to create and instantiate the models. A model is created and instantiated in 3 steps:
 Select the model type
Select appropriate backbone
Instantiate the model
The code block below creates and instantiates the 4 models of interest.
# VFNet
vf_model_type = models.mmdet.vfnet
vf_backbone = vf_model_type.backbones.resnet50_fpn_mstrain_2x
vf_model = vf_model_type.model(backbone = vf_backbone(pretrained=True), num_classes=len(parser.class_map))
﻿
# Detectron
d_model_type = models.mmdet.detr
d_backbone = d_model_type.backbones.r50_8x2_150e_coco
d_model = d_model_type.model(backbone = d_backbone(pretrained=True), num_classes=len(parser.class_map))
﻿
# YoloV5
extra_args = {}
y5_model_type = models.ultralytics.yolov5
y5_backbone = y5_model_type.backbones.small
# The yolov5 model requires an img_size parameter
extra_args['img_size'] = image_size
y5_model = y5_model_type.model(backbone = y5_backbone(pretrained=True), num_classes=len(parser.class_map), **extra_args)
﻿
# EfficientDet
extra_args = {}
e_model_type = models.ross.efficientdet
e_backbone = e_model_type.backbones.tf_lite0
# The efficientdet model requires an img_size parameter
extra_args['img_size'] = image_size
e_model = e_model_type.model(backbone=e_backbone(pretrained=True), num_classes=len(parser.class_map), **extra_args) 
The models are now ready to be trained. Before we start training these models we need to Initialize WandB. You can log and visualize metrics on WandB either by creating an account or using it anonymously. To initialize W&B we'll use the following command:
# Initialize WandB
wandb.init(project = 'RoadSignDetection', name = 'VFNet')
This will initialize W&B interface and output a link to the dashboard to see the charts of the logged metrics. 
It is really important to understand the arguments project and name of the init method. The project argument assigns a title to the project that you are working on currently. This project can contain metrics and logs of a single model or multiple models. The data of each model is stored in what is called a "Run". The name argument assigns a title to the "run". Hence, for every new model that we train we'll initialize W&B with the same project name but with a different run name. We'll start with VFNet hence, for this run we'll set project = 'RoadSignDetection' and name='VFNet' as shown above. For every consecutive run, we'll re-initialize W&B with an additional argument reinit = True.  
After initializing W&B we'll start training the VFNet model using fast.ai with 2 callbacks: WandBCallBack() and SaveModelCallback() to log metrics as shown below:
vf_learn = vf_model_type.fastai.learner(dls = [train_dl, valid_dl],
                                  model = vf_model, 
                                  metrics = metrics, 
                                  cbs = [WandbCallback(),
                                  SaveModelCallback()]) 
# Find the learning rate
vf_learn.lr_find()
﻿
# Train the model
vf_learn.fine_tune(20, 2e-04, freeze_epochs =1)
The other models are trained exactly as shown above. The only additional part to keep in mind is to reinitialize W&B for a new run and change the run name. If you train YoloV5 in the next run, make sure to reinitialize W&B as shown:
# Re-initialize WandB
wandb.init(project = 'RoadSignDetection', name = 'YoloV5', reinit = True)
Once all models are trained various metrics can be visualized in Weights and Biases in the form of line graphs. The best model can then be selected after observing these metrics for all models.
6. Selecting the best fitNow that our models are trained let's take a look at the logged metrics in Weights and Biases. The graphs below compare 3 metrics: COCOMetric, Train loss, and Validation Loss for the 4 models we trained. It is clearly evident that COCOMetric for VFNet is significantly higher than the COCOMetric of all the other models. At the same time, the training loss and validation loss or VFNet is very low. Hence, VFNet seems to be the best choice for the task at hand. We can now use VFNet and fine-tune it to further improve the performance.
﻿
﻿
Run set4
﻿
For the task at hand, VFNet turned out to be the best of the rest, as its performance measured by COCOMetric was superior to others, which was our main objective. Depending upon the task there could be other metrics that would be of higher priority and could be used to select the model. For example, if a project is to be implemented with limited computing resources, the models could be compared based on the memory usage in addition to COCOMetric. In that case, the following graphs that visualize the CPU/GPU utilization and Memory Allocation can be used to select the best fit.
﻿
Run set4
﻿
ConclusionTo summarize, in this tutorial we saw how Weights and Biases can be used to log and visualize various metrics for different models of interest to choose the best fit model to iterate upon for better performance. 
If you would like to stay updated about the projects that I work on, feel free to connect/follow me on:
LinkedIn: https://www.linkedin.com/in/meghal-darji/﻿
Twitter: https://twitter.com/meghal_darji﻿
I hope this tutorial helped you learn and have fun at the same time. See you next time.
﻿
Add a comment