Skip to main content

Fixed New Dataset

Created on June 4|Last edited on June 8
These are all using the new long_range_memory_dataset, and my 'candidate' learning rule.

Is one layer or two better?


1k2k3k4kStep0.30.40.50.60.70.80.912
1 16
2 16
Run set
8

should it have a high or low hidden size?
seems like lower is better for these fast iterations.

Run set
23


What's the ideal learning rate, here?

really seems like I can keep going smaller.

Run set
41


Does my learning rule gain an advantage in deeper networks?


Run set
33

actually compare to backprop

Run set
49

Does backprop really just need higher learning rates?

Run set
28


Try different decay rates


Run set
28


Backprop v. Wackprop: Ultimate Matchup


Run set
44


Heroes


Run set
1965