Skip to main content

[Tuning] Initial HP ranges + Maze Pilot

Life's too short to sweep everything.
Created on February 14|Last edited on March 1
  • Reaffirms multisubject and establishes multitask transfer (though pretty mediocre).
  • All this shows so far is that dropout is critical -- perhaps because these models were all size 128; LR ranges are tolerable.
    • But as expected for single session models a wider range of dropout is good (0.1-0.5)


Maze sweeps


20304050607080901002003004005006007008009001k2k3k4k5k6k7k8kStep0.20.30.40.50.60.7
1101001kepoch0.20.30.40.50.60.70.80.9
23456789102030405060708090100200Time (minutes)0.20.30.40.50.60.7
Maze
68
RTT
40
Small
24
Run set 4
40



Run set
68


Examining individual runs for health

The dominant indicator of health is dropout. Dropout 0.7 kills.
This holds for the other runs two, including single-session on maze_med, but the training curves are at least regular for other plots. Relative to dropout, the LR seems relatively inconsequential (we do only search over 1 order of magnitude).
  • Since in my previous experience, high dropout was needed for single-sessions, perhaps change to factor mode changed this regime?

In conclusion my use of dropout for capacity sweeping didn't work well, or was too aggressive.


Run set
32


Supervised setting for RTT decoding - less regularization is almost always better for our current capacities.

Run set
8