Teacher--student (V2.1)
Created on November 29|Last edited on December 2
Comment
Overall results
Results obtained over 5 seeds of teacher/student/expansion. So, for each possible one of 11 pair of coefficients on the 1-simplex (we take steps of size 0.1), we effectively did 5^3 = 125 runs.
Experimental settings: Adam with default settings; minibatches of size 32; training for 1000 + 1000 epochs. Student initial parameters are all sampled from N(0, 1).
Data: Data generated from NN (20, 10, 10) with ReLU activation and dimensionality in brackets. 1000/100/100 dataset. Weights sampled from Uniform(-1, 1)
Initialization: We add a tiny bit of noise to the new neuron parameters to break ties. Noise from N(0, (0.001)^2)
Expansion: Always 2-->3.
Activation: We consider two student activation functions: linear and ReLU.
Showing first 20 groups
88
10
2 --> 3 (relu) [20, 20, 10]
44
15
Add a comment