joelye9

- On RTT data alone, factor 1 is the only one that achieves competence. - With Maze data augmented (i.e. 30K base + 3K added, likely not a representational quality increase), RTT Factor 2 becomes feasible.

joelye9

2023-02-10

3 years ago

Multisession, Multisubject Pilot

Goal: Understand scaling when it probably works

joelye9

2023-02-13

3 years ago

[Scale] 100K Proof of concept

Does scaling help us improve 5K Pitt trials?

joelye9

2023-02-11

3 years ago

[Atomicity] Compute-normalized factor size comps

Hm, BPS clouds picture, let's just look at loss.

joelye9

2023-02-10

3 years ago

[Tuning] Pre-norm, initialization

To scale well we should follow good practices.

joelye9

2023-02-12

3 years ago

[Throughput] Mask Ratio effects

Masking more increases throughput (and perf/flop), but at what cost?

joelye9

2023-02-10

3 years ago

Flat vs factorized

Factorized is objectively much more efficient but a flat model has more eventual potential/is more agnostic. Per Kaiming's spatiotemporal paper, this might be better at scaled throughputs.

joelye9

2023-02-10

3 years ago

[Atomicity] Factor RTT Maze

4 appears better for BPS, loss is too noisy.

joelye9

2023-02-10

3 years ago

Session context hurts RTT?

joelye9

2023-02-05

3 years ago