Skip to main content
stanford-mercury
Projects
suhas-data-efficiency
Reports
Self-distill first try (300M models)
Log in
Sign up
Share
Comment
Star
Self-distill first try (300M models)
Best single model loss: 3.587 Best two ensemble loss: approximately 3.43
Suhas Kotha
Created on July 16
|
Last edited on September 8
Comment
Section 1
eval/dclm/loss
eval/dclm/loss
0.2
0.4
0.6
0.8
run_progress
3.6
3.8
4
4.2
4.4
4.6
300m4k-209Mx16-dclm+sd0805^0.999999-cos-lr0.0030-wd0.10-bs64
300m4k-209Mx16-dclm+sd0805^0.999999-cos-lr0.0030-wd0.20-bs64
300m4k-209Mx16-dclm+sd0805^0.999999-cos-lr0.0030-wd1.60-bs64
300m4k-209Mx16-dclm+sd0805^0.999999-cos-lr0.0030-wd0.80-bs64
300m4k-209Mx16-dclm+sd0805^0.999999-cos-lr0.0030-wd0.40-bs64
300m4k-209Mx16-dclm+sd0805^0.5-cos-lr0.0030-wd0.40-bs64
reference
1
sd0715 runs
1
ens2d0717 runs
2
ens4d0721 runs
1
4 separate
1
sd0805
1
8 sep
1
ablation
6
Add a comment