[Tuning] Initial HP ranges + Maze Pilot

Life's too short to sweep everything.
Created on February 14|Last edited on March 1
Comment
﻿
Reaffirms multisubject and establishes multitask transfer (though pretty mediocre). 
All this shows so far is that dropout is critical -- perhaps because these models were all size 128; LR ranges are tolerable.
But as expected for single session models a wider range of dropout is good (0.1-0.5)
﻿
Maze sweeps﻿
eval_loss
eval_loss
20304050607080901002003004005006007008009001k2k3k4k5k6k7k8kStep0.20.30.40.50.60.7
eval_loss
eval_loss
1101001kepoch0.20.30.40.50.60.70.80.9
eval_loss
eval_loss
23456789102030405060708090100200Time (minutes)0.20.30.40.50.60.7
 
Maze68
RTT40
 
Small24
Run set 440
﻿
﻿
﻿
Run set68
﻿
Examining individual runs for healthThe dominant indicator of health is dropout. Dropout 0.7 kills.
This holds for the other runs two, including single-session on maze_med, but the training curves are at least regular for other plots. Relative to dropout, the LR seems relatively inconsequential (we do only search over 1 order of magnitude).
Since in my previous experience, high dropout was needed for single-sessions, perhaps change to factor mode changed this regime?
﻿
In conclusion my use of dropout for capacity sweeping didn't work well, or was too aggressive.
﻿
﻿
Run set32
﻿
﻿
Supervised setting for RTT decoding - less regularization is almost always better for our current capacities.
﻿
Run set8
﻿
﻿
Add a comment