ALUM - Alpha Scheduling Experiments

Created on May 1|Last edited on May 1
Comment
﻿
ResultsAll testing was carried out with a lr = 2e-5 and used the One Cycle schedule for lr and wd
The average of 3 training runs was calculated, the same 3 random seeds were used for all experiment groups
TLDR;
Best performing settings :
Flat Cosine alpha=1.0  schedule performed best, 0.9422 accuracy, 0.1779 valid loss
One Cycle  alpha=1.0 schedule was the next best, 0.9419 accuracy, 0.1785 valid loss
Flat Cosine alpha=0.5  schedule performed best, 0.9417 accuracy, 0.1968 valid loss
Baseline, alpha=1.0 , was the next best peformer 0.9416 accuracy, 0.1565 valid loss
Using a constant alpha of 1.0, as used in the paper (baseline run) had strong results while being the simplest to implement. A Flat Cosine schedule had the best result
﻿
Valid LossInterestingly the best performing schedules had higher valid loss, as can be seen below, due to a little overfitting actually helping the final accuracy of the model
﻿
Run set18
﻿
Accuracy﻿
﻿
Run set33
﻿
﻿
﻿
Add a comment