Findings
Ran a sweep with 84 runs over a 7 hyper-parameters.
Note: top models are AlexNet!
training_stages
training_stages
learning_rate
learning_rate
bn_weight_decay
bn_weight_decay
These runs have very similar accuracy curves despite being distant in hyperparameter space.