regularize to effective lr
a new mechanism keeps the average of the plasticity values to be around 1, so that the effective lr is always the same, overall.
Created on August 19|Last edited on August 20
Comment
Sweat the small network. 512 hidden size and 5 layers.
Run set
30
Let's just temporarily make plast_clip be a lower-bound clip. I noticed some interesting dynamics with it. For reference, lr and plr are fixed at 1e-3, and the above graph has the lower-bound clip on plasticity at 1e-7.
Run set
19
big version of that first set, with lower-bound set back to 1e-7, but 15 layers and hidden size of 1024.
Run set
16
Add a comment