Skip to main content

Multisession, Multisubject Pilot

Goal: Understand scaling when it probably works
Created on February 13|Last edited on February 17
When scaling up, we noticed that loss curves were pretty uninspiring as to whether anything was happening. Unsure if I just didn't know what to look for, so we wanted to check back to a scenario where we were pretty sure scaling was happening.

In old runs, with multi-animal and multisession scaling, models appear to need fewer steps and epochs to achieve a given loss. Single session model simply overfits.
  • RTT_all was never run with factor, so it takes less memory and batch size is much larger.

Old runs


1101001kStep0.20.30.40.50.60.70.80.91
1101001kepoch0.20.30.40.50.60.70.80.91
89102030405060708090100200300400epoch0.0010.010.1
Run set
8



Re-run, with sweeps

In lieu of sweeping capacity, we sweep dropout, learning rate.
Results are consistent with both multi-session and multi-subject transfer, without any particular conditioning other than providing context. No context ablation done at this point.
  • Note, masking rate is now set to 0.8 for throughput, scores may be higher than before
  • Also, other than rtt_loco_indy_flat, all other flat references are using standard infill, not asymmetric infill (config accident)
  • rtt_indy should be competitive with single (if not slightly better), but needed more patience (used default of 25 instead of 50)
  • These results show scaling in session to have expected leftward shifts in efficiency., still at 20ms.
  • Improvements from scaling are relatively small on the y-axis, on the order of 0.01 on the y-axis.
  • Duo (and loco single) is probably a bit worse than single since it didn't get much patience, looks like it's still learning.
  • Comparing rtt_indy_flat_8l and rtt_indy_flat_shuffle suggests flatness improves on regular flat model primarily due to space masking, not capacity. (No comparison to joint model here)


Single
75
Indy all
38
Loco
16
Duo
8