Drscotthawley's workspace
Runs
4
State
Notes
User
Tags
Created
Runtime
Sweep
batch_size
bias
dropout
epochs
learning_rate
n_blocks
n_embd
n_heads
seq_length
vocab_sizes
weight_decay
use_alibi
use_mamba
epoch
step
train
train_ema
val
val_ema
Finished
-
drscotthawley
6m 21s
-
256
false
0.5
12
0.001
8
256
16
128
[128,15,28]
0.04
false
true
2
382
0.66084
0.71361
1.54811
1.48589
Finished
-
drscotthawley
2m 24s
-
128
false
0.1
20
0.001
4
128
8
64
[128,15,28]
0.01
false
true
8
1656
0.81879
0.86366
1.47952
1.53342
Finished
-
drscotthawley
16m 36s
-
128
false
0.1
20
0.001
4
128
8
64
[128,15,28]
0.01
-
-
20
4580
1.00426
0.99505
1.14948
1.19745
Crashed
-
drscotthawley
19h 53m 13s
-
256
false
0.5
60
0.001
8
256
16
128
[128,15,28]
0.04
-
-
14
3026
0.76521
0.74835
1.11198
1.13289
1-4
of 4