Skip to main content

[Atomicity] Compute-normalized factor size comps

Hm, BPS clouds picture, let's just look at loss.
Created on February 10|Last edited on February 14

There may be no regime-change difference in achieved performance, but smaller factors may be more sample-efficient in a way that justifies extra-compute.

rtt_f2_m75 converges first.

0.40.50.60.70.80.912345678910203040Time (hours)0.10.110.120.130.140.150.16
10203040Time (hours)0.090.1
Run set
6


Another data point for RTT: unannotated uses 1 GPU, flat_<x> series uses 4 GPUs
  • The best atom depends both on budget.
    • Convergence in a day, no obvious win-loss after convergence.
        • worth a followup to see if variant gaps are correspondingly scaled in larger data.

Run set
6

  • On the other hand, who cares about the factor model if we're competitive with the best with flat models and we can scale flat more?