Skip to main content

1. Supervised Baseline

Report documenting our efforts to build a robust image classifier which will act as baseline.
Created on May 27|Last edited on May 29

Baseline

This is the barebone classifier with defaults settings and no regularization.
  • Baseline trained for 10 epochs vs 20 epochs.
  • Training for longer helped a bit but it's clear that overfitting is massive.

baseline-e10baseline-e200.02.04.06.0
051015Step2468
Run set
2


Baseline + Regularize Backbone

Here we are regularizing the ResNet50 backbone with L2 norm and weight = 0.0001.
  • Slight reduction in overfitting.
  • Regularization reduced the top1 val acc by ~1%.

Run set
2


Baseline + Hyperparameters from Paper

We used the hyperparameters used by the authors of A Realistic Evaluation of Semi-Supervised Learning who won the Kaggle competition (we are using it's dataset).
Below are the hyperparameters:
  • We are comparing baseline (baseline-e10) with baseline+hyperparameters from the paper (baseline-l2-paper).
  • Note to be confused: In the paper, the backbone is regularized but we also experimented without the regularizer hence the run name baseline-paper.

Run set
3

  • The hyperparameters from the paper is giving a 10% increase in the val_top@1 accuracy.
  • The training is converging better.

Run set
3


Applying Augmentation

Now let's fix the hyperparameters and regularization from the last experiment and apply augmentation policies.
  • RandomResizedCrop (default params from ImageNet) and Random Flip (random horizontal+vertical) were used.
  • Augmentation is regularizing.
  • Gain of 0.0020% 🥺 only.
  • We might need to add more augmentation policies and tune RandomResizedCrop. We can try the following augmentations.
    • Color, Jitter
    • Mixup
    • Cutout
    • AugMix
    • RandAugment (wanna try this)

Run set
2


Class Weights

Using class weights to regularize the loss.
Let's first see the effect on baseline.
  • Big time improvement in overfitting.
  • Big time regularization.

Run set
2

Now let's see the effect on our best training result so far (baseline-l2-paper-aug).
  • The accuracy reduced due to regulaization from class weights.
  • Need to train longer maybe.

Run set
2


Learning Rate

Let's now see the effect of constant initial learning rate applied.
  • learning rate of 0.1 and 0.0001 - This gave us the worst val top 1 accuracy.
  • learning rate of 0.01 - This was better.
  • learning rate of 0.0045 - This hyperparameter was taken from the paper. This gave us much better result.
  • learning rate of 0.001 - This gave us the best result, a val top 1 accuracy of ~20%. Training this for longer(30 epochs) gives us the result of ~22%.

Run set
5