Skip to main content

Fixed New Dataset

Created on June 4|Last edited on June 8
These are all using the new long_range_memory_dataset, and my 'candidate' learning rule.

Is one layer or two better?


Select runs that logged avg_loss
to visualize data in this line chart.
Run set
0

should it have a high or low hidden size?
seems like lower is better for these fast iterations.

Run set
0


What's the ideal learning rate, here?

really seems like I can keep going smaller.

Run set
0


Does my learning rule gain an advantage in deeper networks?


Run set
0

actually compare to backprop

Run set
0

Does backprop really just need higher learning rates?

Run set
0


Try different decay rates


Run set
0


Backprop v. Wackprop: Ultimate Matchup


Run set
0


Heroes


Run set
1019