Kabachuha's workspace
Runs
3
State
Notes
User
Tags
Created
Runtime
Sweep
always_save_checkpoint
backend
batch_size
beta1
beta2
bias
block_size
compile
dataset
decay_lr
device
dropout
dtype
eval_interval
eval_iters
eval_only
grad_clip
gradient_accumulation_steps
init_from
learning_rate
log_interval
lr_decay_iters
max_iters
min_lr
n_embd
n_head
n_layer
out_dir
wandb_log
wandb_project
wandb_run_name
warmup_iters
weight_decay
iter
lr
mfu
train/loss
val/loss
Finished
-
kabachuha
8m 16s
-
false
nccl
64
0.9
0.99
false
256
false
shakespeare_char
true
cuda
0.2
bfloat16
250
200
false
1
1
scratch
0.001
10
5000
5000
0.0001
192
6
16
out-shakespeare-char-boosted
true
shakespeare-char
mini-gpt-boosted
100
0.1
5000
0.0001
4.20265
0.79017
1.58425
Finished
-
kabachuha
1m 23s
-
false
nccl
64
0.9
0.99
false
256
false
shakespeare_char
true
cuda
0.2
bfloat16
250
200
false
1
1
scratch
0.001
10
5000
5000
0.0001
192
6
2
out-shakespeare-char
true
shakespeare-char
mini-gpt
100
0.1
5000
0.0001
3.80027
1.3042
1.53787
Finished
-
kabachuha
35m 56s
-
false
nccl
64
0.9
0.99
false
256
false
shakespeare_char
true
cuda
0.2
bfloat16
250
200
false
1
1
scratch
0.001
10
5000
5000
0.0001
192
6
2
out-shakespeare-char
true
shakespeare-char
mini-gpt-kan
100
0.1
5000
0.0001
1.03221
1.29722
1.55397
1-3
of 3