Morgan's workspace
Runs
27
Name
15 visualized
Runtime
State
User
lr
train/loss
val/loss_prosecraft_val_ft
anneal_steps
temp
top_p
tokens_processed
6d 20h 4m 9s
Finished
morgan
0.00002
2.15559
2.22871
43120
1
0.9
2830827520
5d 23h 30m 12s
Crashed
morgan
0.00001
2.17868
2.22513
43120
1
0.9
2478768128
22m 1s
Finished
morgan
0.00001
-
-
42895
0.9
0.9
-
21m 41s
Finished
morgan
0.00001
-
-
42895
1
0.9
-
23m 14s
Finished
morgan
0.00001
-
-
42895
-
-
-
21m 24s
Finished
morgan
0.00001
-
-
42895
-
-
-
21m 52s
Failed
morgan
0.00001
-
-
42895
-
-
-
23m 43s
Crashed
morgan
0.00001
-
-
42895
-
-
-
24m 4s
Crashed
morgan
0.00001
-
-
42895
-
-
-
22m 4s
Finished
morgan
0.00001
-
-
42895
-
-
-
19d 16h 37m 32s
Finished
morgan
0.00001
2.21221
2.29307
20000
-
-
1339949056
5d 12h
Crashed
morgan
0.00001
2.21777
2.38704
42895
-
-
-
23d 3h 7m 9s
Finished
morgan
0.00001
1.92378
-
814
-
-
-
18h 28m 13s
Finished
morgan
0.00001
0.10695
-
7700
-
-
-
1h 27m 6s
Killed
morgan
0.00001
2.65913
-
7700
-
-
-
18h 32m 3s
Finished
morgan
0.00001
0.089045
-
7410
-
-
-
2d 19h 13m 3s
Finished
morgan
0.00001
2.10686
-
18450
-
-
-
16h 13s
Finished
morgan
0.00001
0.63784
-
3510
-
-
-
2h 40m 32s
Finished
morgan
0.00005
2.87884
-
290
-
-
-
4d 3h 7m 13s
Crashed
morgan
0.000025
2.21015
-
35438
-
-
-
Notes
Tags
Created
Sweep
bucket
ckpt_every
comment
cores_per_replica
d_model
end_lr
eval_harness_tasks
gradient_accumulation_steps
keep_every
layers
model_dir
n_heads
n_repeats
n_vocab
name
norm
optimizer
pe
pe_rotary_dims
per_replica_batch
prompts
prompts_filename
prompts_path
seq
total_steps
tpu_size
train_set
val_batches
val_every
val_set.pc_testing
val_set.prosecraft_val_ft
val_set.prosecraft_val_old
wandb_project
warmup_steps
weight_decay
noise/B_simple
noise/G_noise
noise/G_noise_avg
noise/S_noise
noise/S_noise_avg
sequences_processed
-
-
prosecraft-storage
1000
200 step warmup followed by linear decay - linear_onecycle_schedule(total_steps, lr, 0.0046, 1.0, 100.0, 10000). Training using the shuffled train dataset, with the old and new val datasets, 1,382,217 tokens / 32 batch size == 43195 steps
8
4096
1.0000e-8
[]
32
5000
28
prosecraft_linear
16
1
50400
prosecraft_linear
layernorm
["optax._src.combine.chain.<locals>.init_fn","optax._src.combine.chain.<locals>.update_fn"]
rotary
64
1
-
-
prompts/prompts.csv
2048
43195
8
prosecraft_ft.train.index
2000
500
-
prosecraft_ft.val.index
-
-
200
0.1
2408.3673
0.016843
0.004553
10.89121
10.96529
1382240
-
-
prosecraft-storage
1000
Continuing training from prosecraft_ft_resumed. Training using the shuffled train dataset, with the old and new val datasets, 1,382,217 tokens / 32 batch size == 43195 steps
8
4096
1.0000e-10
[]
32
5000
28
prosecraft_resumed_ft2
16
1
50400
prosecraft_ft_resumed2
layernorm
["optax._src.combine.chain.<locals>.init_fn","optax._src.combine.chain.<locals>.update_fn"]
rotary
64
1
-
-
prompts/prompts.csv
2048
43195
8
prosecraft_ft.train.index
2000
500
-
prosecraft_ft.val.index
-
-
75
0.1
1539.2122
0.017511
0.0086259
12.73311
13.27715
1210336
-
-
prosecraft-storage
500
Training using the shuffled train dataset, with the old and new val datasets, 1,382,217 tokens / 32 batch size == 43195 steps
8
4096
0.000001
[]
32
5000
28
prosecraft_ft_resumed
16
3
50400
prompt_temp0.9_prosecraft_ft_resumed_20001
layernorm
-
rotary
64
1
prompts.csv
-
-
2048
43195
8
prosecraft_ft.train.index
2000
500
-
prosecraft_ft.val.index
prosecraft.val.index
-
300
0.1
-
-
-
-
-
-
-
-
prosecraft-storage
500
Training using the shuffled train dataset, with the old and new val datasets, 1,382,217 tokens / 32 batch size == 43195 steps
8
4096
0.000001
[]
32
5000
28
prosecraft_ft_resumed
16
3
50400
prompt_temp1.0_prosecraft_ft_resumed_20001
layernorm
-
rotary
64
1
prompts.csv
-
-
2048
43195
8
prosecraft_ft.train.index
2000
500
-
prosecraft_ft.val.index
prosecraft.val.index
-
300
0.1
-
-
-
-
-
-
-
-
prosecraft-storage
500
Training using the shuffled train dataset, with the old and new val datasets, 1,382,217 tokens / 32 batch size == 43195 steps
8
4096
0.000001
[]
32
5000
28
prosecraft_ft_resumed
16
3
50400
prompt_prosecraft_ft_resumed_20001
layernorm
-
rotary
64
1
prompts.csv
-
-
2048
43195
8
prosecraft_ft.train.index
2000
500
-
prosecraft_ft.val.index
prosecraft.val.index
-
300
0.1
-
-
-
-
-
-
-
-
prosecraft-storage
500
Training using the shuffled train dataset, with the old and new val datasets, 1,382,217 tokens / 32 batch size == 43195 steps
8
4096
0.000001
[]
32
5000
28
prosecraft_ft_resumed
16
3
50400
prompt_prosecraft_ft_resumed_20001
layernorm
-
rotary
64
1
prompts.csv
-
-
2048
43195
8
prosecraft_ft.train.index
2000
500
-
prosecraft_ft.val.index
prosecraft.val.index
-
300
0.1
-
-
-
-
-
-
-
-
prosecraft-storage
500
Training using the shuffled train dataset, with the old and new val datasets, 1,382,217 tokens / 32 batch size == 43195 steps
8
4096
0.000001
[]
32
5000
28
prosecraft_ft_resumed
16
3
50400
prompt_prosecraft_ft_resumed_20001
layernorm
-
rotary
64
1
prompts.csv
-
-
2048
43195
8
prosecraft_ft.train.index
2000
500
-
prosecraft_ft.val.index
prosecraft.val.index
-
300
0.1
-
-
-
-
-
-
-
-
prosecraft-storage
500
Training using the shuffled train dataset, with the old and new val datasets, 1,382,217 tokens / 32 batch size == 43195 steps
8
4096
0.000001
[]
32
5000
28
prosecraft_ft_resumed
16
3
50400
prompt_prosecraft_ft_resumed_20001
layernorm
-
rotary
64
1
prompts.csv
-
-
2048
43195
8
prosecraft_ft.train.index
2000
500
-
prosecraft_ft.val.index
prosecraft.val.index
-
300
0.1
-
-
-
-
-
-
-
-
prosecraft-storage
500
Training using the shuffled train dataset, with the old and new val datasets, 1,382,217 tokens / 32 batch size == 43195 steps
8
4096
0.000001
[]
32
5000
28
prosecraft_ft_resumed
16
3
50400
prompt_prosecraft_ft_resumed_20001
layernorm
-
rotary
64
1
prompts.csv
-
-
2048
43195
8
prosecraft_ft.train.index
2000
500
-
prosecraft_ft.val.index
prosecraft.val.index
-
300
0.1
-
-
-
-
-
-
-
-
prosecraft-storage
500
Training using the shuffled train dataset, with the old and new val datasets, 1,382,217 tokens / 32 batch size == 43195 steps
8
4096
0.000001
[]
32
5000
28
prosecraft_ft_resumed
16
3
50400
prosecraft_ft_resumed_slim_20001
layernorm
-
rotary
64
1
prompts.csv
-
-
2048
43195
8
prosecraft_ft.train.index
2000
500
-
prosecraft_ft.val.index
prosecraft.val.index
-
300
0.1
-
-
-
-
-
-
-
-
prosecraft-storage
500
Resuming, training using the shuffled train dataset, with the old and new val datasets, 1,382,217 tokens / 32 batch size == 43195 steps
8
4096
0.000001
[]
32
5000
28
prosecraft_ft_resumed
16
-
50400
prosecraft_ft_resumed
layernorm
["optax._src.combine.chain.<locals>.init_fn","optax._src.combine.chain.<locals>.update_fn"]
rotary
64
1
-
-
-
2048
21000
8
prosecraft_ft.train.index
2000
500
-
prosecraft_ft.val.index
prosecraft.val.index
-
1
0.1
1371.90955
0.024784
0.0092266
13.65221
12.6581
654272
-
-
prosecraft-storage
500
Training using the shuffled train dataset, with the old and new val datasets, 1,382,217 tokens / 32 batch size == 43195 steps
8
4096
0.000001
[]
32
5000
28
prosecraft_ft
16
-
50400
prosecraft_ft
layernorm
["optax._src.combine.chain.<locals>.init_fn","optax._src.combine.chain.<locals>.update_fn"]
rotary
64
1
-
-
-
2048
43195
8
prosecraft_ft.train.index
2000
500
-
prosecraft_ft.val.index
prosecraft.val.index
-
300
0.1
-
-
-
-
-
-
-
-
prosecraft-storage
40
1 epoch, new fine-tuned train dataset, bs 16 (7800 steps) training, 13651 sequences / 16 == 854)
8
4096
0.000001
[]
16
160
28
prosecraft_samples_ft
16
-
50400
samples_ft_16bs_1e_1e-5
layernorm
["optax._src.combine.chain.<locals>.init_fn","optax._src.combine.chain.<locals>.update_fn"]
rotary
64
1
-
-
-
2048
854
8
samples_ft.train.index
2000
40
samples.val.index
-
-
-
40
0.1
-
-
-
-
-
-
-
-
prosecraft-storage
200
10 epochs with 0.2 weight decay, bs 16 (7800 steps) training, 12488 sequences / 16 == 780)
8
4096
0.000001
[]
16
600
28
prosecraft_samples
16
-
50400
samples_wd_16bs_10e_1e-5
layernorm
["optax._src.combine.chain.<locals>.init_fn","optax._src.combine.chain.<locals>.update_fn"]
rotary
64
1
-
-
-
2048
7800
8
samples.train.index
2000
200
samples.val.index
-
-
-
100
0.2
-
-
-
-
-
-
-
-
prosecraft-storage
200
10 epochs with 0.2 weight decay, bs 16 (7800 steps) training, 12488 sequences / 16 == 780)
8
4096
0.000001
[]
16
600
28
mesh_jax_pile_6B_rotary
16
-
50400
samples_wd_16bs_10e_1e-5
layernorm
["optax._src.combine.chain.<locals>.init_fn","optax._src.combine.chain.<locals>.update_fn"]
rotary
64
1
-
-
-
2048
7800
8
samples.train.index
2000
200
samples.val.index
-
-
-
100
0.2
-
-
-
-
-
-
-
-
prosecraft-storage
200
10 epochs with bs 16 (7800 steps) training, 12488 sequences / 16 == 780)
8
4096
0.000005
[]
16
600
28
mesh_jax_pile_6B_rotary
16
-
50400
samples_16bs_10e_1e-5
layernorm
["optax._src.combine.chain.<locals>.init_fn","optax._src.combine.chain.<locals>.update_fn"]
rotary
64
1
-
-
-
2048
7800
8
samples.train.index
2000
200
samples.val.index
-
-
-
390
0.1
-
-
-
-
-
-
-
-
prosecraft-storage
1000
Resumed from step 28k
8
4096
0.000001
[]
32
3000
28
mesh_jax_pile_6B_rotary
16
-
50400
resumed_28k_prosecraft_GPT3
layernorm
["optax._src.combine.chain.<locals>.init_fn","optax._src.combine.chain.<locals>.update_fn"]
rotary
64
1
-
-
-
2048
18650
8
prosecraft.train.index
2000
500
prosecraft.val.index
-
-
-
200
0.1
-
-
-
-
-
-
-
-
prosecraft-storage
200
10 epochs (3900 steps) training, 12488 sequences / 32 == 390)
8
4096
0.000005
[]
32
600
28
mesh_jax_pile_6B_rotary
16
-
50400
samples_10e_3900s_1e-5
layernorm
["optax._src.combine.chain.<locals>.init_fn","optax._src.combine.chain.<locals>.update_fn"]
rotary
64
1
-
-
-
2048
3900
8
samples.train.index
2000
200
samples.val.index
-
-
-
390
0.1
-
-
-
-
-
-
-
-
prosecraft-storage
100
1 epoch (390 steps) training, 12488 sequences / 32 == 390)
8
4096
0.000005
[]
32
300
28
mesh_jax_pile_6B_rotary
16
-
50400
prosecraft_samples_GPT3_6B
layernorm
["optax._src.combine.chain.<locals>.init_fn","optax._src.combine.chain.<locals>.update_fn"]
rotary
64
1
-
-
-
2048
390
8
samples.train.index
2000
30
samples.val.index
-
-
-
100
0.1
-
-
-
-
-
-
-
-
prosecraft-storage
1000
8
4096
0.0000025
[]
32
3000
28
mesh_jax_pile_6B_rotary
16
-
50400
prosecraft_GPT3_6B_pile_rotary
layernorm
["optax._src.combine.chain.<locals>.init_fn","optax._src.combine.chain.<locals>.update_fn"]
rotary
64
1
-
-
-
2048
37303
8
prosecraft.train.index
2000
1000
prosecraft.val.index
-
-
-
1865
0.1
-
-
-
-
-
-
1-20
of 23