Skip to main content

Teacher--student (v. 2.2)

Created on December 9|Last edited on December 9

Overall results

Results obtained over 5 seeds of teacher/student/expansion. So, for each possible one of 11 pair of coefficients on the 1-simplex (we take steps of size 0.1), we effectively did 5^3 = 125 runs.
Experimental settings: Adam with default settings; minibatches of size 32; training for 1000 + 1000 epochs. Student initial parameters are all sampled from N(0, 1).
Data: Data generated from NN (20, 10, 10) with ReLU activation and dimensionality in brackets. 1000/100/100 dataset. Weights sampled from Uniform(-1, 1)
Initialization: We add a tiny bit of noise to the new neuron parameters to break ties. Noise from N(0, (0.001)^2)
Expansion: Always 2-->3.
Activation: We consider two student activation functions: linear and ReLU.

Computing group metrics from first 20 groups
0200400600800Epoch4.44.54.64.7Validation accuracy
coefficients: 0.00, 1.00 val_mse_pretrain
coefficients: 0.10, 0.90 val_mse_pretrain
coefficients: 0.20, 0.80 val_mse_pretrain
coefficients: 0.30, 0.70 val_mse_pretrain
coefficients: 0.40, 0.60 val_mse_pretrain
coefficients: 0.50, 0.50 val_mse_pretrain
coefficients: 0.60, 0.40 val_mse_pretrain
coefficients: 0.70, 0.30 val_mse_pretrain
coefficients: 0.80, 0.20 val_mse_pretrain
coefficients: 0.90, 0.10 val_mse_pretrain
coefficients: 1.00, 0.00 val_mse_pretrain
coefficients: 0.00, 1.00 val_mse_expand
coefficients: 0.10, 0.90 val_mse_expand
coefficients: 0.20, 0.80 val_mse_expand
coefficients: 0.30, 0.70 val_mse_expand
coefficients: 0.40, 0.60 val_mse_expand
coefficients: 0.50, 0.50 val_mse_expand
coefficients: 0.60, 0.40 val_mse_expand
coefficients: 0.70, 0.30 val_mse_expand
coefficients: 0.80, 0.20 val_mse_expand
coefficients: 0.90, 0.10 val_mse_expand
coefficients: 1.00, 0.00 val_mse_expand
2 --> 3 (linear) [20, 20, 10]
275
2 --> 3 (linear; baselines) [20, 20, 10]
0
2 --> 3 (relu) [20, 20, 10]
1100
2 --> 3 (relu; baselines) [20, 20, 10]
0