New Dataset
Created on June 3|Last edited on June 4
Comment
These are all using the new long_range_memory_dataset, and my 'candidate' learning rule.
Is one layer or two better?
Showing first 10 runs
Run set
16
should it have a high or low hidden size?
seems like lower is better for these fast iterations.
Run set
25
What's the ideal learning rate, here?
really seems like I can keep going smaller.
Run set
14
Backprop-trained benchmark.
It's still waay better. Kinda concerning.
backprop_only
7
Does my learning rule gain an advantage in deeper networks?
Run set
752
actually compare to backprop
Run set
19
oops. that yellow guy is my local algorithm, now that I fixed it. Turns out that I had output and label swapped in the loss function. Let's redo this whole report now smh
Add a comment