test new initialization

Created on July 3|Last edited on July 9
Comment
Turns out I was modifying the initialization instead of keeping it default. It seems to be much more stable upon commenting out that line of code. 
3 layers:
﻿
avg_loss
avg_loss
Select runs that logged avg_loss 
to visualize data in this line chart.
Run set0
﻿
5
﻿
Run set0
﻿
Seems like this line got me in trouble:
        hidden = (1.0/hidden.shape[1])*torch.tanh(hidden)  # Apply tanh function to keep hidden from blowing up after many recurrences, as well as control for magnitude of recurrent connection
I initially put it in to reduce volatility as a result of the stronger recurrent connection. I had only put it in the new algorithm though, not the backprop one. Check out how changing it to this helped:
        hidden = torch.tanh(hidden)  # Apply tanh function to keep hidden from blowing up after many recurrences
﻿
Run set0
﻿
another big sweep just to see
﻿
Run set0
﻿
test out candecay values. they still make no difference. 
﻿
Run set0
﻿
tinker with lr and plr, changing code to allow for more immediate changes, ignoring plasticity this time.
﻿
Run set0
﻿
Does the batch size screw things up? Perhaps it gets everything stuck in local minima. Maybe a small batch will let us break out, encouraging exploration. 
﻿
Run set0
﻿
batch size vs learning rate dynamics
﻿
Run set0
﻿
some even smaller lr
﻿
Run set0
﻿
﻿
﻿
Add a comment