Skip to main content

test new initialization

Created on July 3|Last edited on July 9
Turns out I was modifying the initialization instead of keeping it default. It seems to be much more stable upon commenting out that line of code.
3 layers:

05001k1.5k2kStep23
Run set
9

5

Run set
5

Seems like this line got me in trouble:
hidden = (1.0/hidden.shape[1])*torch.tanh(hidden) # Apply tanh function to keep hidden from blowing up after many recurrences, as well as control for magnitude of recurrent connection
I initially put it in to reduce volatility as a result of the stronger recurrent connection. I had only put it in the new algorithm though, not the backprop one. Check out how changing it to this helped:
hidden = torch.tanh(hidden) # Apply tanh function to keep hidden from blowing up after many recurrences

Run set
25

another big sweep just to see

Run set
62

test out candecay values. they still make no difference.

Run set
10

tinker with lr and plr, changing code to allow for more immediate changes, ignoring plasticity this time.

Run set
20

Does the batch size screw things up? Perhaps it gets everything stuck in local minima. Maybe a small batch will let us break out, encouraging exploration.

Run set
21

batch size vs learning rate dynamics

Run set
21

some even smaller lr

Run set
18