Teacher--student (V2)
Created on November 17|Last edited on November 29
Comment
Overall results
Results obtained over 5 seeds of teacher/student/expansion. So, for each possible one of 11 pair of coefficients on the 1-simplex (we take steps of size 0.1), we effectively did 5^3 = 125 runs.
Experimental settings: Adam with default settings; minibatches of size 32; training for 1000 + 1000 epochs. Student initial parameters are all sampled from N(0, 1).
Data: Data generated from NN (20, 10, 10) with ReLU activation and dimensionality in brackets. 1000/100/100 dataset. Weights sampled from Uniform(-1, 1)
Initialization: We add a tiny bit of noise to the new neuron parameters to break ties. Noise from N(0, (0.001)^2)
Expansion: Always 2-->3.
Activation: We consider two student activation functions: linear and ReLU.
Computing group metrics from first 20 groups
2 --> 3 (linear) [20, 20, 10]
275
10
275
0
Add a comment