Comment
optim/learning_rate
optim/learning_rate
Showing first 10 runs
eval/dclm/loss
eval/dclm/loss
Showing first 50 runs
train/loss
train/loss
Showing first 10 runs
0
10
32
744
ensemble members
214
0
Main lessons so far
- Batch size is super important
- lr 3e-3 is good for 64, smaller lr for larger batch size maybe?
Add a comment
Created with ❤️ on Weights & Biases.
https://wandb.ai/stanford-mercury/suhas-data-efficiency/reports/Data-efficiency-scaling-laws--VmlldzoxMzE3MjUzNQ