high lr, tryna be fast.
an attempt to be slow again
absolutely not fast.
truly stabilize
perhaps, now that the memory leak is figured out, we can do even better with a lower plast_clip. grad clip 0. forget rate 0.3.
Select runs that logged avg_loss
to visualize data in this line chart.
Select runs that logged avg_loss
to visualize data in this line chart.
idk maybe having a variety of palindrome lengths will stabilize this. grad clip 0