Skip to main content

ALUM - Alpha Scheduling Experiments

Created on May 1|Last edited on May 1

Results

All testing was carried out with a lr = 2e-5 and used the One Cycle schedule for lr and wd
The average of 3 training runs was calculated, the same 3 random seeds were used for all experiment groups
TLDR;
Best performing settings :
  1. Flat Cosine alpha=1.0 schedule performed best, 0.9422 accuracy, 0.1779 valid loss
  2. One Cycle alpha=1.0 schedule was the next best, 0.9419 accuracy, 0.1785 valid loss
  3. Flat Cosine alpha=0.5 schedule performed best, 0.9417 accuracy, 0.1968 valid loss
  4. Baseline, alpha=1.0 , was the next best peformer 0.9416 accuracy, 0.1565 valid loss
Using a constant alpha of 1.0, as used in the paper (baseline run) had strong results while being the simplest to implement. A Flat Cosine schedule had the best result


Valid Loss

Interestingly the best performing schedules had higher valid loss, as can be seen below, due to a little overfitting actually helping the final accuracy of the model

Run set
18


Accuracy



Run set
33