Teacher--student (V1)
Created on November 16|Last edited on November 16
Comment
Overall results
Results obtained over 10 runs for 4 most interesting combinations of run.
Experimental settings: Adam with default settings; minibatches of size 32; training for 100 + 100 epochs. Initial parameters are all sampled from N(0, 1).
Data: Data generated from NN with ReLU activation and dimensionality in brackets. 1000/100/100 dataset.
Types of initialization: We add a tiny bit of noise to all the matrices to break ties
- Random: Sample new matrix from N(0, 1)
- Random adjusted: Sample from N(\mu, \sigma) where \mu is mean of parent matrix, \sigma is std of parent matrix
- Permuted: Randomly shuffle entries in matrix
- Copy: Copy matrix
- Copy half: Copy the matrix and half it
Expansion: Varies
15
15
15
15
15
15
Add a comment