test new initialization
Created on July 3|Last edited on July 9
Comment
Turns out I was modifying the initialization instead of keeping it default. It seems to be much more stable upon commenting out that line of code.
3 layers:
Run set
9
5
Run set
5
Seems like this line got me in trouble:
hidden = (1.0/hidden.shape[1])*torch.tanh(hidden) # Apply tanh function to keep hidden from blowing up after many recurrences, as well as control for magnitude of recurrent connection
I initially put it in to reduce volatility as a result of the stronger recurrent connection. I had only put it in the new algorithm though, not the backprop one. Check out how changing it to this helped:
hidden = torch.tanh(hidden) # Apply tanh function to keep hidden from blowing up after many recurrences
Run set
25
another big sweep just to see
Run set
62
test out candecay values. they still make no difference.
Run set
10
tinker with lr and plr, changing code to allow for more immediate changes, ignoring plasticity this time.
Run set
20
Does the batch size screw things up? Perhaps it gets everything stuck in local minima. Maybe a small batch will let us break out, encouraging exploration.
Run set
21
batch size vs learning rate dynamics
Run set
21
some even smaller lr
Run set
18
Add a comment