Skip to main content

Sweeps for 2-->3 experiments

Created on December 23|Last edited on December 28
We consider the three methods in the PDF. The setup is the teacher--student setup; same as before.
We consider two ways of generating the linear combination AA (which is a 2x1 matrix in the 2-->3 case):
  1. convex: We sample one weight w Uniform(0,1)w ~ \text{Uniform}(0, 1), and set the other weight as 1w1 - w
  2. non-convex: We sample the two weights from N(0,12)\mathcal{N}(0, 1^2)
I recommend you go by the table, but I left the graph in there as well, just beware it might not display all the runs. The table has two tabs: one for the linear/identity activation case, and one for the relu activation case. The entries are sorted in descending order of epochs until convergence (aka. settings of AA that, across re-runs, seemed to lead to faster convergence times upon growth)

convexnon_convexrun_modefpfp_trivialno_scalemodelinearreluactivation0.00.51.01.52.02.53.03.54.04.5weights_abs_diff3.03.54.04.55.05.56.06.57.07.58.08.59.0avg_large_final_loss9008007006005004003002001000avg_epochs_until_convergence
2-->3 relu
396
2-->3 linear
372