Garg-aayush's workspace
Runs
6
State
Notes
User
Tags
Created
Runtime
Sweep
B
T
ckpt_dir
data_root
data_seed
device
eval_interval
flops_promised
grad_norm_clip
log_interval
max_lr
max_seq_len
max_steps
min_lr
n_embd
n_head
n_layer
num_return_sequences
run_gen_samples
run_hellaswag
run_validation
seed
start_seq
total_batch
use_compile
val_loss_steps
vocab_size
wandb_project
wandb_run_name
warmup_steps
weight_decay
dt
mfu
step
tok/s
train/loss
train/lr
train/norm
val/hella_norm
val/loss
Crashed
-
garg-aayush
1h 33m 31s
-
64
1024
/workspace/ckpt
/workspace/shards
1337
cuda
250
989000000000000
1
1
0.0015
32
19073
0.00015
768
12
12
4
false
true
true
42
Hello, I'm a language model,
524288
true
20
50304
pre-training
gpt2-swiglu
300
0.1
325.42872
0.25452
14697
1611068.60234
3.03232
0.00032305
0.1391
0.31747
3.03106
Failed
-
garg-aayush
1h 56m 8s
-
64
1024
/workspace/ckpt
/workspace/shards
1337
cuda
250
989000000000000
1
1
0.0015
32
19073
0.00015
768
12
12
4
false
true
true
42
Hello, I'm a language model,
524288
true
20
50304
pre-training
gpt2-rope
300
0.1
17789.62946
0.23327
18794
29471.55258
2.96514
0.00015074
0.22232
0.31976
2.98739
Failed
-
garg-aayush
1h 49m 3s
-
64
1024
/workspace/ckpt
/workspace/shards
1337
cuda
250
989000000000000
1
1
0.0015
32
19073
0.00015
768
12
12
4
false
true
true
42
Hello, I'm a language model,
524288
true
20
50304
pre-training
gpt2-global-datafix
300
0.1
16740.42153
0.34117
18794
31318.68567
2.98157
0.00015074
0.26447
0.31468
3.0045
Finished
-
garg-aayush
1h 55m 53s
-
64
1024
/workspace/ckpt
/workspace/shards
1337
cuda
250
989000000000000
1
1
0.0015
32
19073
0.00015
768
12
12
4
false
true
true
42
Hello, I'm a language model,
524288
true
20
50304
pre-training
gpt2-lr-inc
300
0.1
10728.30844
0.34637
19072
48869.58675
3.01242
0.00015
0.21636
0.31149
3.02112
Finished
-
garg-aayush
1h 55m 59s
-
64
1024
/workspace/ckpt
/workspace/shards
1337
cuda
250
989000000000000
1
1
0.0006
32
19073
0.00006
768
12
12
4
true
true
true
42
Hello, I'm a language model,
524288
true
20
50304
pre-training
gpt2-periodicity-fix
715
0.1
10612.93912
0.34485
19072
49400.82988
3.05505
0.00006
0.29316
0.30392
3.06392
Finished
-
garg-aayush
1h 50m 19s
-
64
1024
/workspace/ckpt
/workspace/shards
-
cuda
250
989000000000000
1
1
0.0006
32
19073
0.00006
768
12
12
4
true
true
true
42
Hello, I'm a language model,
524288
true
20
50304
pre-training
gpt2-baseline
715
0.1
10666.78572
0.34441
19072
49151.45142
3.08455
0.00006
0.31972
0.30263
3.06575
1-6
of 6