Skip to main content

Hyperparameter Tuning

Squeezing the best performance out of a model. What are Hyperparameters and how they affect performance
Created on May 7|Last edited on May 10
If you've dabbled with Machine Learning/Deep Learning model training, chances are you're already familiar with hyperparameters. In case you're not, here is a wonderful blog post about model hyperparameters.
Hyperparameter tuning is one of the important stages of a Deep Learning pipeline. This is where you tweak the hyperparameters and check their effect on model performance in order to get the best performance out of the model. Trying out different combinations of hyperparameters often provides valuable insights into the model's performance and limitations. In the previous blog post, we used weights and biases to compare the performance of various models on a dataset and choose the one that had the highest COCO metric (which turned out to be VFNet). In this blog, I'll tweak a few hyperparameters and observe their effect on the model performance.

Overview

When dealing with image datasets for object detection tasks a few hyperparameters that can be tuned are: batch size, image size, and the learning rate. However, I've been using FastAi's lr_find() callback to find the best possible learning rate before we start training. Hence, here I'll try the different combinations of batch sizes and image sizes and see how the model performs. I'll monitor the model performance using Weights and Biases. Normally, larger batch size and image size usually result in better performance. Let's see if we observe this effect.
I'll try batch sizes of 8,16, and 32 and image sizes of 512 and 640. Before we begin I'd like to recall and state the hyperparameter values and the performance of the corresponding model from the first blog post. The image size was set to 384, batch size was 8, and the COCO metric for the corresponding model was 0.72 after training for 20 epochs. Now let's get started with hyperparameter tuning. I'll be using IceVision in Google Colab.
Follow steps 1 through 2 from the first blog post to load and parse the data.
Now we'll try different combinations of batch sizes and image sizes and see if the COCO metric improves with larger batch sizes and larger image sizes. All the models with different hyperparameter settings will be trained for 20 epochs. The following is a repetition of steps 3,4, and 5 from the first blog post with different hyperparameter settings.
image_sizes = [512,640]

for j in range(len(image_sizes)):

train_tfms = tfms.A.Adapter([*tfms.A.aug_tfms(size=image_sizes[j], presize=512),
tfms.A.Normalize()])
valid_tfms = tfms.A.Adapter([*tfms.A.resize_and_pad(image_sizes[j]),
tfms.A.Normalize()])

#create Dataset objects
train_ds = Dataset(train_records, train_tfms)
valid_ds = Dataset(valid_records, valid_tfms)

batch_sizes = [8,16,24]

for i in range(len(batch_sizes)):
# instantiate the model
vf_model_type = models.mmdet.vfnet
vf_backbone = vf_model_type.backbones.resnet50_fpn_mstrain_2x
vf_model = vf_model_type.model(backbone = vf_backbone(pretrained=True),
num_classes=len(parser.class_map))
# data loaders for VFNet
train_dl = vf_model_type.train_dl(train_ds, batch_size=batch_sizes[i],
num_workers=4, shuffle=True)
valid_dl = vf_model_type.valid_dl(valid_ds, batch_size=batch_sizes[i],
num_workers=4, shuffle=False)

wandb.init(project = 'RoadSignDetection',
name = 'VFNet_{}_{}'.format(image_sizes[j], batch_sizes[i]),
reinit=True)

learn = vf_model_type.fastai.learner(dls = [train_dl, valid_dl],
model = vf_model,
metrics = metrics,
cbs = [WandbCallback(),
SaveModelCallback()])

learn.lr_find()

learn.fine_tune(20, 2e-04, freeze_epochs =1)
ckpt_path = 'VFNet_Road_Sign_{}_{}_checkpoint_full.pth'.format(image_sizes[j],
batch_sizes[i])
# Save model checkpoint
save_icevision_checkpoint(vf_model,
model_name='mmdet.vfnet',
backbone_name='resnet50_fpn_mstrain_2x',
classes = parser.class_map.get_classes(),
img_size=image_sizes[j],
filename=ckpt_path,
meta={'icevision_version': '0.12.0'})
# download model checkpoint
files.download('VFNet_Road_Sign_{}_{}_checkpoint_full.pth'.format(image_sizes[j],
batch_sizes[i]))
As mentioned before, I used batch sizes of 8,16, and 24 with image sizes of 512 and 640. However, the google colab runtime ran out of memory during training the model with batch size 24 and image size 540 was trained. Hence, it is important to be mindful of the limitations of the resources when trying different hyperparameters.

Results

The results can be seen in the graph below. The COCO metric for various hyperparameter combinations is summarized in the table below:


Well, it is clear that larger image sizes and larger batch sizes do give better results. we can compare the result among batch sizes and image sizes to see how the hyperparameters had an effect on the COCO metric.
If we take a look at COCO metrics for models with the image size hyperparameter of 512 we can see that a batch size of 8 resulted in a COCO metric of 0.787 whereas a batch size of 24 increased the COCO metric to 0.793.
Similarly, if we take a look at models with the image size hyperparameter of 8, we can see that an image size of 512 led to a COCO metric of 0.787 and an image size of 640 improved the COCO metric to 0.806 (which is nearly 2% larger).
In comparison to the COCO metric of the model in the first blog post (image size=384 and batch size=8) which was 0.72 the model with batch size equal to 16 and image size 640 provides an improvement of nearly 8% which is a remarkable improvement.

Run set
5


Conclusion

By tuning 2 hyperparameter settings we observed that the model performance (assessed by the COCO metric) can substantially improve which can be also proved by the fact that both the training loss and the validation loss decreased for larger image sizes and larger batch sizes. Hence, the rule of thumb is to always try out a couple of models with different hyperparameter settings to squeeze out the best possible performance.