[Scale] Inductive bias
Created on March 3|Last edited on March 4
Comment
This data is unsorted, stable interface, so results are not confounded by artificial channel shifting/padding.
While spacetime is worse at 800 scale cross-animal transfer, it recovers by 6400 trials.
- stitch was reduced to bottleneck of 64 dimensions.
- Cross-animal does appear to scale, though it does so again at a factor below standard multi-session.
- It is for example unclear (and I don't think we have the means to really test this) whether the saturation of transfer from cross-animal is different from cross-session, but Rizzoglio's CCA alignment suggests no fundamental difference about cross-animal.
- That will be empirical.
Run set
6
Run set 2
2
Run set 3
5
Similarly the spacetime model does underperform for 1600 across-session, but it also recovers by around 6K trials.
Stitch actually never outperforms flat. I'm going to tentatively conclude that stitching isn't a helpful mechanism, actually, due to the # of params it introduces -- it is still relatively advantageous when channels are unstable across contexts (e.g. cross-subject or with sorted neurons) - because nonflat does fail in this case.
So there are several separate throughlines
- Spacetime is the best architecture, and in particular it overtakes non spatial by around 6K trials.
- Cross-context transfer does occur and does scale. In-context scales better, but also won't scale beyond a hypothetical e.g. 10K trials.
- Tiny caveat for avoiding direct comparisons with single_100 line below is that the scaling we observe below is acausal but IRDT that's an issue.
- Cross-context transfer will likely saturate sooner than in-context, but we can get a respectable amount of scaling (from 100 to 1600 trials).
- But due to power laws we likely will not see great scaling from e.g. 400 trials (I doubt we'll ever see 400 -> 3200).
Weak point is that stitching isn't really examined in depth, but I'm not really sure what else to do (PCR is inapplicable).
Run set
4
3
Add a comment