[Tuning] Initial HP ranges + Maze Pilot
Life's too short to sweep everything.
Created on February 14|Last edited on March 1
Comment
- Reaffirms multisubject and establishes multitask transfer (though pretty mediocre).
- All this shows so far is that dropout is critical -- perhaps because these models were all size 128; LR ranges are tolerable.
- But as expected for single session models a wider range of dropout is good (0.1-0.5)
Maze sweeps
68
RTT
40
24
Run set 4
40
Run set
68
Examining individual runs for health
The dominant indicator of health is dropout. Dropout 0.7 kills.
This holds for the other runs two, including single-session on maze_med, but the training curves are at least regular for other plots. Relative to dropout, the LR seems relatively inconsequential (we do only search over 1 order of magnitude).
- Since in my previous experience, high dropout was needed for single-sessions, perhaps change to factor mode changed this regime?
In conclusion my use of dropout for capacity sweeping didn't work well, or was too aggressive.
Run set
32
Supervised setting for RTT decoding - less regularization is almost always better for our current capacities.
Run set
8
Add a comment