Select runs that logged avg_loss to visualize data in this line chart.
Run set
0
lr at 1e-8
Run set
0
sweep_working_long - similar parameters, just a longer runtime
learning rate of 1e-8, but that's balanced by plasticity caps of as high as 1e8, as well. It's usually immediately unstable at that, though. Plasticity learning rates were tested at 1e-2, 1e-3, and 1e-4.