Thecr7guy3's workspace
Runs
8
Name
5 visualized
State
Notes
User
Tags
Created
Runtime
Sweep
activation
batch_size
beta1
beta2
cache_dir
checkpoint_dir
dataset
dropout
epochs
generate_interval
grad_clip
log_interval
log_model_summary
max_lr
mega_batch_size
min_lr
n_embd
n_heads
n_layer
optimizer
sample_max_tokens
save_every
seed
seq_len
total_batch_size
vocab_size
wandb
warmup_steps
weight_decay
epoch
lr
train/average_throughput
train/avg_loss
train/grad_norm
train/lr
train/perplexity
train/step_loss
train/step_throughput
train/time_per_step
train/total_tokens_seen
validation/avg_loss
validation/perplexity
Finished
thecr7guy3
5h 59m 16s
-
gelu
8
0.9
0.95
-
-
<test_model.Config object at 0x76fe3eb88ad0>
0.1
10
1000
1
5
-
0.0006
-
0.00006
768
12
12
-
-
-
42
1024
524288
50304
<test_model.Config object at 0x76fe3c1db550>
720
0.1
9
-
84294.34879
3.60601
0.26473
0.00006
36.81888
3.61163
84641.38646
6194.22746
1735393280
3.77598
43.64037
Finished
thecr7guy3
8m 11s
-
gelu
8
0.9
0.95
./data
checkpoints
<test_model.Config object at 0x7e260f197d50>
0.1
5
5000
1
-
true
0.0006
-
0.00006
768
12
12
adamw
100
1
42
1024
-
50304
<test_model.Config object at 0x7e260f1a2390>
720
0.1
4
0.0003
23463.25295
5.60519
-
-
271.83331
5.73147
23876.25002
343.10246
8445952
5.71889
304.56666
Finished
thecr7guy3
6m 21s
-
gelu
8
0.9
0.95
./data
checkpoints
<test_model.Config object at 0x709e957a0710>
0.1
5
5000
1
-
true
0.0006
-
0.00006
768
12
12
adamw
100
1
42
1024
-
50304
<test_model.Config object at 0x709e957a0650>
720
0.1
4
-
29079.14996
7.58708
0.9791
0.00006
1972.539
7.55811
26947.37688
19455.99389
7864320
7.52446
1852.80599
Finished
thecr7guy3
3m 46s
-
gelu
8
0.9
0.95
./data
checkpoints
<test_model.Config object at 0x79495ece4f10>
0.1
5
5000
1
-
true
0.0006
512
0.00006
768
12
12
adamw
100
1
42
1024
-
50304
<test_model.Config object at 0x7949dd997e10>
720
0.1
4
0.0003
40825.99139
5.69776
-
-
298.19749
5.65534
40989.25917
199.85723
4329472
5.91954
372.24219
Finished
thecr7guy3
5m 15s
-
gelu
8
0.9
0.95
./data
checkpoints
<test_model.Config object at 0x7bf94b397d50>
0.1
5
5000
1
-
true
0.0006
512
0.00006
768
12
12
adamw
100
1
42
1024
-
50304
<test_model.Config object at 0x7bf94b397c90>
720
0.1
4
0.0003
28508.75805
5.74601
-
-
312.93969
5.69425
28626.22116
286.1712
4329472
5.9424
380.84907
Finished
thecr7guy3
5m 57s
-
gelu
8
0.9
0.95
./data
checkpoints
<test_model.Config object at 0x77b1da3ab010>
0.1
5
5000
1
-
true
0.0006
512
0.00006
768
12
12
adamw
100
1
42
1024
-
50304
<test_model.Config object at 0x77b1da3a8b50>
720
0.1
4
0.0003
14369.76011
5.69823
-
-
298.34017
5.6488
14411.83792
568.4216
4329472
5.9252
374.35301
Finished
thecr7guy3
7m 12s
-
gelu
8
0.9
0.95
./data
checkpoints
<test_model.Config object at 0x7fd538993c10>
0.1
5
5000
1
-
true
0.0006
512
0.00006
768
12
12
adamw
100
1
42
1024
-
50304
<test_model.Config object at 0x7fd538993b50>
720
0.1
4
0.0003
11528.94983
5.68316
-
-
293.87537
5.64226
11573.71304
707.81088
4329472
5.92167
373.03342
Finished
thecr7guy3
9m 45s
-
gelu
8
0.9
0.95
./data
checkpoints
<test_model.Config object at 0x7983a1f9ff90>
0.1
5
5000
1
-
true
0.0006
512
0.00006
768
12
12
adamw
100
1
42
1024
-
50304
<test_model.Config object at 0x7983a1f9ff50>
720
0.1
4
0.0003
8308.87798
5.67997
-
-
292.93929
5.64278
8318.13939
984.83562
4329472
5.91833
371.79178
1-8
of 8