completely unstructured play

Created on June 25|Last edited on June 28
Comment
comparing learning rates of the main weights and the plasticity.
﻿
loss
loss
Select runs that logged loss 
to visualize data in this line chart.
Run set0
﻿
with a hidden size of 256 instead of 128.
﻿
Run set0
﻿
really narrow down what the best learning rates pairing is. Looks like it's at least a 1e-5 or 5e-5 lr, and a 1e-7 plr.
﻿
Run set0
﻿
Now, let's explore some finer tunes, as well as deeper layers. We want it to learn fast, but not diverge/explode later. This one at 256 hidden size.
﻿
Run set0
﻿
﻿
﻿
Run set0
﻿
512 hidden size
﻿
Run set0
﻿
some random stuff - experiment with smaller batch sizes
﻿
Run set0
﻿
some rudimentary experimentation with code changes
﻿
Run set0
﻿
more discipline. I allow a smaller clip for the plasticity values, as well as a larger one. I want to know whether it improves anything, like stability.
﻿
Run set0
﻿
﻿
Add a comment