Skip to main content

completely unstructured play

Created on June 25|Last edited on June 28
comparing learning rates of the main weights and the plasticity.

Select runs that logged loss
to visualize data in this line chart.
Run set
0

with a hidden size of 256 instead of 128.

Run set
0

really narrow down what the best learning rates pairing is. Looks like it's at least a 1e-5 or 5e-5 lr, and a 1e-7 plr.

Run set
0

Now, let's explore some finer tunes, as well as deeper layers. We want it to learn fast, but not diverge/explode later. This one at 256 hidden size.

Run set
0



Run set
0

512 hidden size

Run set
0

some random stuff - experiment with smaller batch sizes

Run set
0

some rudimentary experimentation with code changes

Run set
0

more discipline. I allow a smaller clip for the plasticity values, as well as a larger one. I want to know whether it improves anything, like stability.

Run set
0