Remove the recurrent connection. Try to get the rnn running, anyways, using weight updates for short-term memory. The plasticity value ought to be instrumental, here, some weights being more plastic than others and therefore changing quickly so as to store info.
Just look at the whole entire sweep. It's over learning rate, plasticity learning rate, and the batch size. I just wanna catch the high performers here, with the lowest loss numbers the fastest.
avg_loss
avg_loss
Select runs that logged avg_loss to visualize data in this line chart.
Run set
0
Which batch size seems to do the most? Is it really the batch size that I care about?
Run set
0
I really don't want learning rate to be the defining factor, here, but it seems like it wants to be. That's just gonna get us to the local minima of repeating the same input character back. No memory really needed for that, unfortunately.
Run set
0
I'd like to see this have a bigger influence. The plasticity learning rate ought to define how much plasticity values truly separate out short term and long term memory connections.
Run set
0
I want to know more about the batch size of 1. It's probably what's going to let me really encode memory in the weights.
Run set
0
some more drastic numbers
Run set
0
add some noise
Run set
0
add some noise, but include a different sort of initialization