limit effective lr
experiments with static plasticity and with trainable
Created on August 5|Last edited on August 6
Comment
static plasticity, lr 1e-3, 5 layers, 1024 parameters in each layer. The main difference is the dataset, it uses long_range_memory_dataset, which is mostly zeroes, but has a pattern where the model needs to store a value to be accessed later.
The goal is to see if this larger model manages to correctly store and access info, only provided once before.
Run set
3
Same dimensions, different dataset, different lr. Uses baby names dataset. I'd like to see whether the slower-learning parameters manages to preserve important info, while still allowing the model to adapt to the current context of the name being predicted. This model ought to do better than a baseline that has uniform plasticity values. Some may have a higher range of possible plasticity values, i forget. "worldly sea" is the uniform baseline. as well as "morning-star".
Run set
6
let's try adding in some EMA of the weights...
Run set
1706
ok now, since we see some results, let's let it change some of that plasticity. (no EMA) so we're just changing the plasticity learning rate here.
Run set
4
again, with long_range_memory_dataset
Run set
5
Add a comment