[Atomicity] MC_RTT 5ms

- On RTT data alone, factor 1 is the only one that achieves competence. - With Maze data augmented (i.e. 30K base + 3K added, likely not a representational quality increase), RTT Factor 2 becomes feasible.

Joel Ye

Created on February 10|Last edited on February 17

Comment

﻿
Status: Concluded; factor only affects efficiency (within a 1-4x scope), data may interact with feasible factors.
Flat models behave a little differently and appear slightly more stable in training.
Initial impressions indicated that factor-4 models were not stable on RTT; but continued training eventually stabilized.
Adding maze data improves learning efficiency, at least in factor models.
I can't quite tell what the diff is that would make even val loss different when including maze; most data is the same so perf on Zenodo Indy datasets should be similar. Rerunning to confirm (see @Multisession, Multisubject pilot).
The simple truth is that rtt_maze_5_factor_2 is good in a way that's not matched by any run on only rtt data, except factor_1. 
﻿
Factorized models only﻿
Run set17
﻿
Including flat models.The picture is complicated once we introduce flat models. Smaller factors don't appear very different, possibly more overfit by flat_1.
That is, flat_1 is behaving worse than factor_1 for MC_RTT evaluation.
Only flat_1, flat_2, flat_4 are matched to only use Indy, appear comparable to factor_2, factor_4
rtt_joint_f4_m5 (which includes Loco data and is a slightly bigger model to compensate) similarly matches performance 
﻿
Run set13
﻿
﻿
To revisit a motivating point that factor_4 runs seems quite seem to bec unstable; this doesn't actually seem to be the case after training for longer periods of time (though factor_4 does appear less efficient.
flat runs appear different in that flat_1,2,4 are similarly stable.
Note the rtt_5_factor_1 multi runs aren't particularly more efficient; the first blue curve (`jlvknwc8`) ~ matches the efficiency of flat curves and only sustained training hits the new plateau. 
This plateau is again similarly reached simply by adding maze data.
﻿
Run set10
﻿
﻿
﻿
Adding Maze stabilizes learning?Comparing rtt_5_factor_2 vs rtt_maze_5_factor_2 makes it clear maze can sometimes help learning significantly.
In flat models, the verdict is a bit less clear since the trajectories start quite good; plausible that all models would've hit insight to reach ceiling levels in the next 10 hours of training, with or without maze.
﻿
﻿
Run set8
﻿
﻿

Add a comment