ALUM - Alpha Scheduling Experiments
Created on May 1|Last edited on May 1
Comment
Results
All testing was carried out with a lr = 2e-5 and used the One Cycle schedule for lr and wd
The average of 3 training runs was calculated, the same 3 random seeds were used for all experiment groups
TLDR;
Best performing settings :
- Flat Cosine alpha=1.0 schedule performed best, 0.9422 accuracy, 0.1779 valid loss
- One Cycle alpha=1.0 schedule was the next best, 0.9419 accuracy, 0.1785 valid loss
- Flat Cosine alpha=0.5 schedule performed best, 0.9417 accuracy, 0.1968 valid loss
- Baseline, alpha=1.0 , was the next best peformer 0.9416 accuracy, 0.1565 valid loss
Using a constant alpha of 1.0, as used in the paper (baseline run) had strong results while being the simplest to implement. A Flat Cosine schedule had the best result
Valid Loss
Interestingly the best performing schedules had higher valid loss, as can be seen below, due to a little overfitting actually helping the final accuracy of the model
Run set
18
Accuracy
Run set
33
Add a comment