Skip to main content

regularize to effective lr

a new mechanism keeps the average of the plasticity values to be around 1, so that the effective lr is always the same, overall.
Created on August 19|Last edited on August 20
Sweat the small network. 512 hidden size and 5 layers.

02k4k6k8k10k12k14kStep12
1 1.0000e-10
0.5 1.0000e-10
0.5 1.0000e-15
0.1 0.001
0.1 0.00001
0.1 1.0000e-7
0.1 1.0000e-10
0.05 0.0001
0.01 0.01
0.01 0.001
0.01 0.0001
0.01 0.00001
0.001 0.1
0.001 0.01
0.001 0.001
0.001 0.0001
0.001 0.00001
0.0001 0.001
Run set
30


Let's just temporarily make plast_clip be a lower-bound clip. I noticed some interesting dynamics with it. For reference, lr and plr are fixed at 1e-3, and the above graph has the lower-bound clip on plasticity at 1e-7.

Run set
19

big version of that first set, with lower-bound set back to 1e-7, but 15 layers and hidden size of 1024.

Run set
16