nanoRWKV Table – Weights & Biases

Skip to main content

Hannibal046's workspace

Runs

4

iter

lr

mfu

train/loss

val/loss

Finished

-

hannibal046

2y ago

8d 3h 44m 6s

-

true

nccl

12

0.9

0.95

false

1024

true

openwebtext

true

cuda

0

float16

1000

200

false

1

40

scratch

0.0006

10

600000

600000

0.00006

gpt

768

12

12

out

true

true

nanoRWKV

gpt2-124M

2000

0.1

600000

0.00006

14.68565

2.82708

2.86211

Finished

-

hannibal046

2y ago

9d 7h 13m 58s

-

true

nccl

12

0.9

0.95

false

1024

true

openwebtext

true

cuda

0

float16

1000

200

false

1

40

scratch

0.0006

10

600000

600000

0.00006

rwkv

768

12

12

out

true

true

nanoRWKV

RWKV-130M

2000

0.1

600000

0.00006

-100

2.85009

2.88179

Crashed

-

hannibal046

2y ago

5d 3h 52m 57s

-

true

nccl

12

0.9

0.99

false

1024

true

openwebtext

true

cuda

0

float16

1000

200

false

1

40

scratch

0.0006

10

600000

600000

0.00006

rwkv

768

12

12

out

true

true

nanoRWKV

RWKV-130M

2000

0

330000

0.00028902

-100

2.86132

2.90861

Crashed

-

hannibal046

2y ago

5d 4h 38m 53s

-

true

nccl

12

0.9

0.95

false

1024

true

openwebtext

true

cuda

0

float16

1000

200

false

1

40

scratch

0.0006

10

600000

600000

0.00006

gpt

768

12

12

out

true

true

nanoRWKV

gpt2-124M

2000

0.1

378000

0.00022373

14.53988

2.91059

2.90944

1-4

of 4