Select runs that logged avg_loss to visualize data in this line chart.
Run set
0
Ok, now let's see its performance on other datasets.
Run set
0
Let's see if I can't push those more complex datasets through. You really only see it retaining longer-term dependencies around the 0.7 loss mark, and the loss appears to rise after the 2.5k iteration mark, so I'll bet some more layers/smaller learning rates will break the barriers.