Skip to main content

New Dataset

Created on June 3|Last edited on June 4
These are all using the new long_range_memory_dataset, and my 'candidate' learning rule.

Is one layer or two better?


Showing first 10 runs
5001k1.5k2k2.5k3kStep100200300400
Run set
16

should it have a high or low hidden size?
seems like lower is better for these fast iterations.

Run set
25


What's the ideal learning rate, here?

really seems like I can keep going smaller.

Run set
14


Backprop-trained benchmark.

It's still waay better. Kinda concerning.

backprop_only
7


Does my learning rule gain an advantage in deeper networks?


Run set
752

actually compare to backprop

Run set
19

oops. that yellow guy is my local algorithm, now that I fixed it. Turns out that I had output and label swapped in the loss function. Let's redo this whole report now smh