limit effective lr

experiments with static plasticity and with trainable
Created on August 5|Last edited on August 6
Comment
static plasticity, lr 1e-3, 5 layers, 1024 parameters in each layer. The main difference is the dataset, it uses long_range_memory_dataset, which is mostly zeroes, but has a pattern where the model needs to store a value to be accessed later.
The goal is to see if this larger model manages to correctly store and access info, only provided once before. 
﻿
avg_loss
avg_loss
Select runs that logged avg_loss 
to visualize data in this line chart.
Run set0
﻿
Same dimensions, different dataset, different lr. Uses baby names dataset. I'd like to see whether the slower-learning parameters manages to preserve important info, while still allowing the model to adapt to the current context of the name being predicted. This model ought to do better than a baseline that has uniform plasticity values. Some may have a higher range of possible plasticity values, i forget. "worldly sea" is the uniform baseline. as well as "morning-star".
﻿
Run set0
﻿
let's try adding in some EMA of the weights...
﻿
Run set4701
﻿
ok now, since we see some results, let's let it change some of that plasticity. (no EMA) so we're just changing the plasticity learning rate here. 
﻿
Run set0
﻿
again, with long_range_memory_dataset
﻿
Run set0
﻿
﻿
﻿
Add a comment