Ve-forbryderne's workspace
Runs
4
State
Notes
User
Tags
Created
Runtime
Sweep
anneal_steps
bucket
ckpt_every
comment
compat
cores_per_replica
d_model
end_lr
eval_harness_tasks
gradient_accumulation_steps
keep_every
layers
lr
model_dir
n_heads
n_vocab
name
norm
pe
pe_rotary_dims
per_replica_batch
seq
total_steps
tpu_size
train_set
val_batches
val_every
wandb_project
warmup_steps
weight_decay
noise/G_noise_avg
noise/S_noise_avg
noise/grad_noise_scale
train/grad_norm
train/last_loss
train/learning_rate
train/loss
train/steps_per_sec
train/tokens_per_sec
Finished
-
ve-forbryderne
2h 29m 4s
-
396
q00q
9999999
neox
32
6144
0.000001
[]
32
9999999
44
0.000005
20B-32
64
50432
phase4
layernorm
neox_rotary
24
1
2048
441
32
20B-skein-phase4.train.index
9999999
9999999
skein-20b
45
0.1
0.057908
41.65419
719.32216
1.125
2.72975
0.000001
2.38408
0.054733
3586.97513
Finished
-
ve-forbryderne
1h 33m 49s
-
234
q00q
9999999
neox
32
6144
4.0000e-7
[]
32
9999999
44
0.000004
20B-32
64
50432
phase3
layernorm
neox_rotary
24
1
2048
246
32
20B-skein-phase1.train.index
9999999
9999999
skein-20b
12
0.1
0.31875
42.05107
131.92402
1.19531
1.78694
4.0000e-7
1.97396
0.054737
3587.2237
Finished
-
ve-forbryderne
11h 46m 35s
-
2042
q00q
9999999
neox
32
6144
0.000001
[]
32
9999999
44
0.00001
20B-32
64
50432
phase2
layernorm
neox_rotary
24
1
2048
2268
32
20B-skein-phase2.train.index
9999999
9999999
skein-20b
226
0.1
0.0097726
42.85703
4385.4226
1.16406
2.88541
0.000001
2.43648
0.054727
3586.61054
Finished
-
ve-forbryderne
6h 25m 34s
-
1111
q00q
9999999
neox
32
6144
4.0000e-7
[]
32
9999999
44
0.000004
20B-32
64
50432
phase1
layernorm
neox_rotary
24
1
2048
1234
32
20B-skein-phase1.train.index
9999999
9999999
skein-20b
123
0.1
0.050359
43.05916
855.04619
1.17969
2.36053
4.0000e-7
1.94151
0.054743
3587.61731
1-4
of 4