Controlled RTT Benefit comparison
Created on March 29|Last edited on March 29
Comment
Nearly 1:1 comparison of RTT experiments with NLB RTT dataset; replicated small benefit in own setting, no benefit in NLB.
- In distribution scaling shows RTT is definitely not saturated
- Playing with model parameter count (and other tuning parameters) suggests relatively robust pretraining scores.
- Can still nail down the preprocessing of NLB, perhaps worth understanding where the improvement is coming from.
- OTOH, I'm not sure about return for experiment. Could this simply be a scale issue? Is there something else at play here? With the modest effect size all conclusions are not that believable.
Run set
7
Add a comment