Skip to main content

str

Created on June 28|Last edited on July 3
5 layers deep, so it's gotta be lrs of 1e6 and 1e8, with a batch size of 32. I vary the upper limit of the plasticity. On my own machine, I change up the lower limit.

Select runs that logged avg_loss
to visualize data in this line chart.
Run set
0



Run set
0

1e7 and 1e9, batch size of one now. I vary the clip of the plasticity again.

Run set
0



Run set
0




Run set
0

shoot - now it looks like, compared to backprop, i've got issues with deep layers. that's sad. time to try to deepen. I'll start from what works and go from there.

Run set
0

get recurring - I try from what works, first. vanilla candidate, without the meta-plasticity. Then I attempt to get meta-plastic networks to the same performance. This is especially hard on progressively harder datasets.

Run set
0


wait -- does plastic-candidate even converge at all?

Run set
0

Time to do a big sweep, to find whether I can get 4 layers to work.

Run set
0

drilling down further. still at 4 layers, we've identified that it's pretty much gotta have a lr smaller than 1e-5.

Run set
0

try 5 layers

Run set
0

uhhh, something I randomly did here brought 5-layer networks back to working, again. I worry that it's less robust compared to backprop. It seems like my learning rules get very volatile when the size of the layers increases or the number of layers increases.

Run set
0

Ok, full sweep on only layers size.

Run set
0