1. Supervised Baseline

Report documenting our efforts to build a robust image classifier which will act as baseline.
Created on May 27|Last edited on May 29
Comment
﻿
BaselineThis is the barebone classifier with defaults settings and no regularization.
Baseline trained for 10 epochs vs 20 epochs.
Training for longer helped a bit but it's clear that overfitting is massive.
﻿
val_top@1
val_top@1
baseline-e10baseline-e200.02.04.06.0
Overfitting
Overfitting
051015Step2468
Run set2
﻿
Baseline + Regularize BackboneHere we are regularizing the ResNet50 backbone with L2 norm and weight = 0.0001.
Slight reduction in overfitting.
Regularization reduced the top1 val acc by ~1%.
﻿
Run set2
﻿
Baseline + Hyperparameters from PaperWe used the hyperparameters used by the authors of A Realistic Evaluation of Semi-Supervised Learning who won the Kaggle competition (we are using it's dataset).
Below are the hyperparameters:
We are comparing baseline (baseline-e10) with baseline+hyperparameters from the paper (baseline-l2-paper).
Note to be confused: In the paper, the backbone is regularized but we also experimented without the regularizer hence the run name baseline-paper.
﻿
Run set3
﻿
The hyperparameters from the paper is giving a 10% increase in the val_top@1 accuracy.
The training is converging better.
﻿
Run set3
﻿
Applying AugmentationNow let's fix the hyperparameters and regularization from the last experiment and apply augmentation policies.
RandomResizedCrop (default params from ImageNet) and Random Flip (random horizontal+vertical) were used.
Augmentation is regularizing.
Gain of 0.0020% 🥺 only. 
We might need to add more augmentation policies and tune RandomResizedCrop. We can try the following augmentations.
Color, Jitter
Mixup
Cutout
AugMix
RandAugment (wanna try this)
﻿
Run set2
﻿
Class WeightsUsing class weights to regularize the loss.
Let's first see the effect on baseline.
Big time improvement in overfitting.
Big time regularization.
﻿
Run set2
﻿
Now let's see the effect on our best training result so far (baseline-l2-paper-aug).
The accuracy reduced due to regulaization from class weights.
Need to train longer maybe.
﻿
Run set2
﻿
Learning RateLet's now see the effect of constant initial learning rate applied.
learning rate of 0.1 and 0.0001 - This gave us the worst val top 1 accuracy.
learning rate of 0.01 - This was better.
learning rate of 0.0045 - This hyperparameter was taken from the paper. This gave us much better result.
learning rate of 0.001 - This gave us the best result, a val top 1 accuracy of ~20%. Training this for longer(30 epochs) gives us the result of ~22%. 
﻿
Run set5
﻿
﻿
﻿
Add a comment