Hannibal046's workspace
Runs
4
State
Notes
User
Tags
Created
Runtime
Sweep
always_save_checkpoint
backend
batch_size
beta1
beta2
bias
block_size
compile
dataset
decay_lr
device
dropout
dtype
eval_interval
eval_iters
eval_only
grad_clip
gradient_accumulation_steps
init_from
learning_rate
log_interval
lr_decay_iters
max_iters
min_lr
model_type
n_embd
n_head
n_layer
out_dir
use_customized_cuda_kernel
wandb_log
wandb_project
wandb_run_name
warmup_iters
weight_decay
iter
lr
mfu
train/loss
val/loss
Finished
-
hannibal046
8d 3h 44m 6s
-
true
nccl
12
0.9
0.95
false
1024
true
openwebtext
true
cuda
0
float16
1000
200
false
1
40
scratch
0.0006
10
600000
600000
0.00006
gpt
768
12
12
out
true
true
nanoRWKV
gpt2-124M
2000
0.1
600000
0.00006
14.68565
2.82708
2.86211
Finished
-
hannibal046
9d 7h 13m 58s
-
true
nccl
12
0.9
0.95
false
1024
true
openwebtext
true
cuda
0
float16
1000
200
false
1
40
scratch
0.0006
10
600000
600000
0.00006
rwkv
768
12
12
out
true
true
nanoRWKV
RWKV-130M
2000
0.1
600000
0.00006
-100
2.85009
2.88179
Crashed
-
hannibal046
5d 3h 52m 57s
-
true
nccl
12
0.9
0.99
false
1024
true
openwebtext
true
cuda
0
float16
1000
200
false
1
40
scratch
0.0006
10
600000
600000
0.00006
rwkv
768
12
12
out
true
true
nanoRWKV
RWKV-130M
2000
0
330000
0.00028902
-100
2.86132
2.90861
Crashed
-
hannibal046
5d 4h 38m 53s
-
true
nccl
12
0.9
0.95
false
1024
true
openwebtext
true
cuda
0
float16
1000
200
false
1
40
scratch
0.0006
10
600000
600000
0.00006
gpt
768
12
12
out
true
true
nanoRWKV
gpt2-124M
2000
0.1
378000
0.00022373
14.53988
2.91059
2.90944
1-4
of 4