Shehper's workspace
Runs
38
Name
7 visualized
max_iters
n_layer
n_embd
n_head
out_dir
N
val/loss
train/loss
250000
8
64
2
out-N-3.93e+05
393216
4.53775
4.52522
250000
4
64
2
out-D-9.04e+09-N-1.97e+05
196608
4.71475
4.70398
250000
10
192
3
out-D-9.04e+09-N-4.42e+06
4423680
3.73926
3.74172
250000
8
256
4
out-N-6.29e+06
6291456
3.6224
3.59586
250000
8
128
2
out-N-1.57e+06
1572864
4.0669
4.06087
250000
4
128
2
out-D-9.04e+09-N-7.86e+05
786432
4.23614
4.23197
250000
8
32
2
out-N-9.83e+04
98304
5.19576
5.19791
1-7
of 7