1. Supervised Baseline
Report documenting our efforts to build a robust image classifier which will act as baseline.
Created on May 27|Last edited on May 29
Comment
Baseline
This is the barebone classifier with defaults settings and no regularization.
- Baseline trained for 10 epochs vs 20 epochs.
- Training for longer helped a bit but it's clear that overfitting is massive.
Run set
2
Baseline + Regularize Backbone
Here we are regularizing the ResNet50 backbone with L2 norm and weight = 0.0001.
- Slight reduction in overfitting.
- Regularization reduced the top1 val acc by ~1%.
Run set
2
Baseline + Hyperparameters from Paper
We used the hyperparameters used by the authors of A Realistic Evaluation of Semi-Supervised Learning who won the Kaggle competition (we are using it's dataset).
Below are the hyperparameters:
- We are comparing baseline (baseline-e10) with baseline+hyperparameters from the paper (baseline-l2-paper).
- Note to be confused: In the paper, the backbone is regularized but we also experimented without the regularizer hence the run name baseline-paper.
Run set
3
- The hyperparameters from the paper is giving a 10% increase in the val_top@1 accuracy.
- The training is converging better.
Run set
3
Applying Augmentation
Now let's fix the hyperparameters and regularization from the last experiment and apply augmentation policies.
- RandomResizedCrop (default params from ImageNet) and Random Flip (random horizontal+vertical) were used.
- Augmentation is regularizing.
- Gain of 0.0020% 🥺 only.
- We might need to add more augmentation policies and tune RandomResizedCrop. We can try the following augmentations.
- Color, Jitter
- Mixup
- Cutout
- AugMix
- RandAugment (wanna try this)
Run set
2
Class Weights
Using class weights to regularize the loss.
Let's first see the effect on baseline.
- Big time improvement in overfitting.
- Big time regularization.
Run set
2
Now let's see the effect on our best training result so far (baseline-l2-paper-aug).
- The accuracy reduced due to regulaization from class weights.
- Need to train longer maybe.
Run set
2
Learning Rate
Let's now see the effect of constant initial learning rate applied.
- learning rate of 0.1 and 0.0001 - This gave us the worst val top 1 accuracy.
- learning rate of 0.01 - This was better.
- learning rate of 0.0045 - This hyperparameter was taken from the paper. This gave us much better result.
- learning rate of 0.001 - This gave us the best result, a val top 1 accuracy of ~20%. Training this for longer(30 epochs) gives us the result of ~22%.
Run set
5
Add a comment