Skip to main content

completely unstructured play

Created on June 25|Last edited on June 28
comparing learning rates of the main weights and the plasticity.

05001k1.5k2k2.5kStep110
0.001 1.0000e-10 5
0.001 1.0000e-15 5
0.001 1.0000e-20 5
0.0001 1 5
0.0001 0.001 5
0.0001 0.00001 5
0.0001 0.000001 5
0.0001 1.0000e-7 5
0.0001 1.0000e-8 5
0.0001 1.0000e-10 5
0.0001 1.0000e-15 5
0.0001 1.0000e-20 5
0.0001 1.0000e-30 5
0.0001 0 5
0.00001 0.001 5
Run set
28

with a hidden size of 256 instead of 128.

Run set
24

really narrow down what the best learning rates pairing is. Looks like it's at least a 1e-5 or 5e-5 lr, and a 1e-7 plr.

Run set
25

Now, let's explore some finer tunes, as well as deeper layers. We want it to learn fast, but not diverge/explode later. This one at 256 hidden size.

Run set
41



Run set
36

512 hidden size

Run set
16

some random stuff - experiment with smaller batch sizes

Run set
22

some rudimentary experimentation with code changes

Run set
6

more discipline. I allow a smaller clip for the plasticity values, as well as a larger one. I want to know whether it improves anything, like stability.

Run set
10