Skip to main content

Controlled RTT Benefit comparison

Created on March 29|Last edited on March 29

Nearly 1:1 comparison of RTT experiments with NLB RTT dataset; replicated small benefit in own setting, no benefit in NLB.
  • In distribution scaling shows RTT is definitely not saturated
  • Playing with model parameter count (and other tuning parameters) suggests relatively robust pretraining scores.
  • Can still nail down the preprocessing of NLB, perhaps worth understanding where the improvement is coming from.
    • OTOH, I'm not sure about return for experiment. Could this simply be a scale issue? Is there something else at play here? With the modest effect size all conclusions are not that believable.

10010ktrainer/global_step0.30.4
Run set
7