Teacher--student (v. 2.2)
Created on December 9|Last edited on December 9
Comment
Overall results
Results obtained over 5 seeds of teacher/student/expansion. So, for each possible one of 11 pair of coefficients on the 1-simplex (we take steps of size 0.1), we effectively did 5^3 = 125 runs.
Experimental settings: Adam with default settings; minibatches of size 32; training for 1000 + 1000 epochs. Student initial parameters are all sampled from N(0, 1).
Data: Data generated from NN (20, 10, 10) with ReLU activation and dimensionality in brackets. 1000/100/100 dataset. Weights sampled from Uniform(-1, 1)
Initialization: We add a tiny bit of noise to the new neuron parameters to break ties. Noise from N(0, (0.001)^2)
Expansion: Always 2-->3.
Activation: We consider two student activation functions: linear and ReLU.
Computing group metrics from first 20 groups
2 --> 3 (linear) [20, 20, 10]
275
0
1100
0
Add a comment