Teacher--student (V2)

Created on November 17|Last edited on November 29
Comment
﻿
Overall resultsResults obtained over 5 seeds of teacher/student/expansion. So, for each possible one of 11 pair of coefficients on the 1-simplex (we take steps of size 0.1), we effectively did 5^3 = 125 runs.
Experimental settings: Adam with default settings; minibatches of size 32; training for 1000 + 1000 epochs. Student initial parameters are all sampled from N(0, 1).
Data: Data generated from NN (20, 10, 10) with ReLU activation and dimensionality in brackets. 1000/100/100 dataset. Weights sampled from Uniform(-1, 1)
Initialization: We add a tiny bit of noise to the new neuron parameters to break ties. Noise from N(0, (0.001)^2)
Expansion: Always 2-->3.
Activation: We consider two student activation functions: linear and ReLU.
﻿
Validation MSE during training
Validation MSE during training
Computing group metrics from first 20 groups
0200400600800Epoch2.12.152.22.25Validation accuracy
coefficients: 0.00, 1.00   val_mse_pretrain
coefficients: 0.10, 0.90   val_mse_pretrain
coefficients: 0.20, 0.80   val_mse_pretrain
coefficients: 0.30, 0.70   val_mse_pretrain
coefficients: 0.40, 0.60   val_mse_pretrain
coefficients: 0.50, 0.50   val_mse_pretrain
coefficients: 0.60, 0.40   val_mse_pretrain
coefficients: 0.70, 0.30   val_mse_pretrain
coefficients: 0.80, 0.20   val_mse_pretrain
coefficients: 0.90, 0.10   val_mse_pretrain
coefficients: 1.00, 0.00   val_mse_pretrain
coefficients: 0.00, 1.00   val_mse_expand
coefficients: 0.10, 0.90   val_mse_expand
coefficients: 0.20, 0.80   val_mse_expand
coefficients: 0.30, 0.70   val_mse_expand
coefficients: 0.40, 0.60   val_mse_expand
coefficients: 0.50, 0.50   val_mse_expand
coefficients: 0.60, 0.40   val_mse_expand
coefficients: 0.70, 0.30   val_mse_expand
coefficients: 0.80, 0.20   val_mse_expand
coefficients: 0.90, 0.10   val_mse_expand
coefficients: 1.00, 0.00   val_mse_expand
2 --> 3 (linear) [20, 20, 10]275
 
2 --> 3 (linear; baselines) [20, 20, 10]10
 
2 --> 3 (relu) [20, 20, 10]275
 
2 --> 3 (relu; baselines) [20, 20, 10]0
﻿
﻿
﻿
﻿
Add a comment