completely unstructured play
Created on June 25|Last edited on June 28
Comment
comparing learning rates of the main weights and the plasticity.
Run set
28
with a hidden size of 256 instead of 128.
Run set
24
really narrow down what the best learning rates pairing is. Looks like it's at least a 1e-5 or 5e-5 lr, and a 1e-7 plr.
Run set
25
Now, let's explore some finer tunes, as well as deeper layers. We want it to learn fast, but not diverge/explode later. This one at 256 hidden size.
Run set
41
Run set
36
512 hidden size
Run set
16
some random stuff - experiment with smaller batch sizes
Run set
22
some rudimentary experimentation with code changes
Run set
6
more discipline. I allow a smaller clip for the plasticity values, as well as a larger one. I want to know whether it improves anything, like stability.
Run set
10
Add a comment