Skip to main content
W&B will be performing maintenance on Saturday, Nov 22nd starting at 6:00 PM PST. The UI and API may be intermittently unavailable during this time. Thank you for your patience and visit https://status.wandb.com for updates.

extra experiments

Created on November 14|Last edited on November 19



Each pre-training batch has 0.25M tokens, so 800 batches = 200M tokens, etc.


1.5B and 3.2B: Existing approaches (epoching, parameter scaling)


3.43.53.63.73.83.9
Tuning epoch count and parameter count
64
1_5bk and 3_2bk
15
Name
15 visualized
4
38
4
41
4
37
6
79
15
25
11
13
8
7
ID
eval/dclm/loss
train/loss
Tags
User
State
Created
Runtime
data.max_train_batches.dclm
eval/paloma/c4_en/loss
3200
3.84677
3.66853
kothasuhas
Finished
2mo 4d 6h 6m 55s
3200
4.01404
6400
3.50449
3.35757
kothasuhas
Finished
1mo 28d 21h 15m 46s
6400
3.66528
1600
4.1104
3.85359
kothasuhas
Finished
3mo 2d 21h 21m 52s
1600
4.28488
800
4.62744
3.88677
kothasuhas
Finished
3mo 12d 6h 29m 11s
800
4.81479
1.54147328e+08
4.6113
4.41116
kothasuhas
Finished
8d 3h 16m 3s
800
4.78011
2.99649792e+08
4.60452
3.48377
kothasuhas
Finished
23d 10h 8m 45s
800
4.80902
6.02457088e+08
4.72446
4.1618
kothasuhas
Finished
8d 5h 36m 29s
800
4.90609
1.431373824e+09
4.89631
3.77729
kothasuhas
Finished
1mo 7d 14h 37m 4s
800
5.09249
1_4b4k-209Mx4-dclm-cos-lr0.0003-wd0.10-bs64
3.76432
3.23075
data-efficiency
kothasuhas
Finished
26m 23s
800
3.92812
1_4b4k-209Mx8-dclm-cos-lr0.0010-wd0.10-bs64
3.81799
2.89524
data-efficiency
weight-decay-8-4
kothasuhas
Finished
52m 56s
800
3.99095
1_4b4k-209Mx4-dclm-cos-lr0.0010-wd0.10-bs64
3.84467
3.60723
data-efficiency
kothasuhas
Finished
26m 38s
800
4.00456
1_4b4k-209Mx4-dclm-cos-lr0.0001-wd0.10-bs64
4.00927
3.73687
data-efficiency
kothasuhas
Finished
1h 8m 43s
800
4.18317
1_4b4k-209Mx2-dclm-cos-lr0.0003-wd0.10-bs64
4.04799
3.79379
data-efficiency
kothasuhas
Finished
48m 4s
800
4.21753
1_4b4k-209Mx8-dclm-cos-lr0.0003-wd0.10-bs64
4.2074
2.14639
data-efficiency
kothasuhas
Finished
44m 50s
800
4.41264
1_4b4k-209Mx1-dclm-cos-lr0.0003-wd0.10-bs64
4.92475
4.81925
data-efficiency
kothasuhas
Finished
13m 12s
800
5.10174
1_4b4k-209Mx1-dclm-cos-lr0.0001-wd0.10-bs64
5.00571
4.89503
data-efficiency
kothasuhas
Finished
37m 32s
800
5.17818
1_4b4k-209Mx1-dclm-cos-lr0.0010-wd0.10-bs64
5.40783
5.2838
data-efficiency
kothasuhas
Finished
13m 21s
800
5.58723
1_4b4k-209Mx16-dclm-cos-lr0.0010-wd0.10-bs64
5.63408
1.41124
data-efficiency
kothasuhas
Finished
16h 31m 58s
800
5.98333
1.540732416e+09
4.41734
3.77467
konwook
Finished
8h 55s
800
4.58903
1_5b4k-209Mx4-dclm-cos-lr0.0010-wd0.10-bs64
3.72485
3.30732
data-efficiency
konwook
Finished
1h 49m 46s
800
3.88607
1_5b4k-209Mx4-dclm-cos-lr0.0003-wd0.10-bs64
3.80161
3.29061
data-efficiency
konwook
Finished
1h 50m 28s
800
3.9697
1_5b4k-209Mx8-dclm-cos-lr0.0010-wd0.10-bs64
3.99835
2.45337
data-efficiency
konwook
Finished
2h 59m 26s
800
4.18905
1_5b4k-209Mx4-dclm-cos-lr0.0001-wd0.10-bs64
4.10006
3.85942
data-efficiency
konwook
Finished
1h 50m 3s
800
4.27705
1_5b4k-209Mx2-dclm-cos-lr0.0010-wd0.10-bs64
4.1001
3.99828
data-efficiency
konwook
Finished
1h 15m 41s
800
4.27631
1_5b4k-209Mx2-dclm-cos-lr0.0003-wd0.10-bs64
4.16552
4.03307
data-efficiency
konwook
Finished
1h 15m 27s
800
4.33948
1_5b4k-209Mx8-dclm-cos-lr0.0003-wd0.10-bs64
4.30028
2.12241
data-efficiency
konwook
Finished
4h 54m 28s
800
4.50899
1_5b4k-209Mx4-dclm-cos-lr0.0030-wd0.10-bs64
7.14794
7.13286
data-efficiency
konwook
Finished
1h 50m 30s
800
7.2656
3.243444224e+09
4.33219
4.10162
konwook
Finished
19h 6m 58s
800
4.50855
3_2b4k-209Mx4-dclm-cos-lr0.0003-wd0.10-bs64
3.74383
3.05626
data-efficiency
konwook
Finished
12h 11m 41s
800
3.91035
3_2b4k-209Mx4-dclm-cos-lr0.0001-wd0.10-bs64
3.94154
3.50768
data-efficiency
konwook
Finished
3h 1m 11s
800
4.11173
3_2b4k-209Mx4-dclm-cos-lr0.0010-wd0.10-bs64
4.09872
3.93742
data-efficiency
konwook
Finished
3h 1m 17s
800
4.26905
3_2b4k-209Mx2-dclm-cos-lr0.0003-wd0.10-bs64
4.17911
4.06022
data-efficiency
konwook
Finished
2h 5m 15s
800
4.35804
3_2b4k-209Mx2-dclm-cos-lr0.0001-wd0.10-bs64
4.43318
4.30729
data-efficiency
konwook
Finished
12h 58m 45s
800
4.61309
3_2b4k-209Mx2-dclm-cos-lr0.0010-wd0.10-bs64
4.67828
4.62266
data-efficiency
konwook
Finished
2h 5m 17s
800
4.8755
3_2b4k-209Mx1-dclm-cos-lr0.0003-wd0.10-bs64
5.25069
5.21984
data-efficiency
konwook
Finished
1h 37m 14s
800
5.4221
1-4
of 4


1.5B and 3.2B: Regularized parameter scaling



Tuning weight decay
334
1_5bk and 3_2bk
40
moe
64





norms
49
reg
49


Downstream