Multisession, Multisubject Pilot
Goal: Understand scaling when it probably works
Created on February 13|Last edited on February 17
Comment
When scaling up, we noticed that loss curves were pretty uninspiring as to whether anything was happening. Unsure if I just didn't know what to look for, so we wanted to check back to a scenario where we were pretty sure scaling was happening.
In old runs, with multi-animal and multisession scaling, models appear to need fewer steps and epochs to achieve a given loss. Single session model simply overfits.
- RTT_all was never run with factor, so it takes less memory and batch size is much larger.
Old runs
Run set
8
Re-run, with sweeps
In lieu of sweeping capacity, we sweep dropout, learning rate.
Results are consistent with both multi-session and multi-subject transfer, without any particular conditioning other than providing context. No context ablation done at this point.
- Note, masking rate is now set to 0.8 for throughput, scores may be higher than before
- Also, other than rtt_loco_indy_flat, all other flat references are using standard infill, not asymmetric infill (config accident)
- rtt_indy should be competitive with single (if not slightly better), but needed more patience (used default of 25 instead of 50)
- These results show scaling in session to have expected leftward shifts in efficiency., still at 20ms.
- Improvements from scaling are relatively small on the y-axis, on the order of 0.01 on the y-axis.
- Duo (and loco single) is probably a bit worse than single since it didn't get much patience, looks like it's still learning.
- Comparing rtt_indy_flat_8l and rtt_indy_flat_shuffle suggests flatness improves on regular flat model primarily due to space masking, not capacity. (No comparison to joint model here)
75
Indy all
38
16
8
Add a comment