positive plasticity only.
Select runs that logged avg_loss
to visualize data in this line chart.
gelu
use mse loss instead of crossentropy.
slower last layer, remove relu derivative calculation, since I'm using sigmoid rn.
even slower last layer
actually make the last layer faster
maybe fixed pos_only
go mini