Sweeps for 2-->3 experiments
Created on December 23|Last edited on December 28
Comment
We consider the three methods in the PDF. The setup is the teacher--student setup; same as before.
We consider two ways of generating the linear combination (which is a 2x1 matrix in the 2-->3 case):
- convex: We sample one weight , and set the other weight as
- non-convex: We sample the two weights from
I recommend you go by the table, but I left the graph in there as well, just beware it might not display all the runs. The table has two tabs: one for the linear/identity activation case, and one for the relu activation case. The entries are sorted in descending order of epochs until convergence (aka. settings of that, across re-runs, seemed to lead to faster convergence times upon growth)
2-->3 relu
396
2-->3 linear
372
Add a comment