Skip to main content

test new initialization

Created on July 3|Last edited on July 9
Turns out I was modifying the initialization instead of keeping it default. It seems to be much more stable upon commenting out that line of code.
3 layers:

Select runs that logged avg_loss
to visualize data in this line chart.
Run set
0

5

Run set
0

Seems like this line got me in trouble:
hidden = (1.0/hidden.shape[1])*torch.tanh(hidden) # Apply tanh function to keep hidden from blowing up after many recurrences, as well as control for magnitude of recurrent connection
I initially put it in to reduce volatility as a result of the stronger recurrent connection. I had only put it in the new algorithm though, not the backprop one. Check out how changing it to this helped:
hidden = torch.tanh(hidden) # Apply tanh function to keep hidden from blowing up after many recurrences

Run set
0

another big sweep just to see

Run set
0

test out candecay values. they still make no difference.

Run set
0

tinker with lr and plr, changing code to allow for more immediate changes, ignoring plasticity this time.

Run set
0

Does the batch size screw things up? Perhaps it gets everything stuck in local minima. Maybe a small batch will let us break out, encouraging exploration.

Run set
0

batch size vs learning rate dynamics

Run set
0

some even smaller lr

Run set
0