pre-training Table – Weights & Biases

Skip to main content

Garg-aayush's workspace

Runs

6

dt

mfu

step

tok/s

train/loss

train/lr

train/norm

val/hella_norm

val/loss

Crashed

-

garg-aayush

4d ago

1h 33m 31s

-

64

1024

/workspace/ckpt

/workspace/shards

1337

cuda

250

989000000000000

1

1

0.0015

32

19073

0.00015

768

12

12

4

false

true

true

42

Hello, I'm a language model,

524288

true

20

50304

pre-training

gpt2-swiglu

300

0.1

325.42872

0.25452

14697

1611068.60234

3.03232

0.00032305

0.1391

0.31747

3.03106

Failed

-

garg-aayush

4d ago

1h 56m 8s

-

64

1024

/workspace/ckpt

/workspace/shards

1337

cuda

250

989000000000000

1

1

0.0015

32

19073

0.00015

768

12

12

4

false

true

true

42

Hello, I'm a language model,

524288

true

20

50304

pre-training

gpt2-rope

300

0.1

17789.62946

0.23327

18794

29471.55258

2.96514

0.00015074

0.22232

0.31976

2.98739

Failed

-

garg-aayush

4d ago

1h 49m 3s

-

64

1024

/workspace/ckpt

/workspace/shards

1337

cuda

250

989000000000000

1

1

0.0015

32

19073

0.00015

768

12

12

4

false

true

true

42

Hello, I'm a language model,

524288

true

20

50304

pre-training

gpt2-global-datafix

300

0.1

16740.42153

0.34117

18794

31318.68567

2.98157

0.00015074

0.26447

0.31468

3.0045

Finished

-

garg-aayush

6d ago

1h 55m 53s

-

64

1024

/workspace/ckpt

/workspace/shards

1337

cuda

250

989000000000000

1

1

0.0015

32

19073

0.00015

768

12

12

4

false

true

true

42

Hello, I'm a language model,

524288

true

20

50304

pre-training

gpt2-lr-inc

300

0.1

10728.30844

0.34637

19072

48869.58675

3.01242

0.00015

0.21636

0.31149

3.02112

Finished

-

garg-aayush

6d ago

1h 55m 59s

-

64

1024

/workspace/ckpt

/workspace/shards

1337

cuda

250

989000000000000

1

1

0.0006

32

19073

0.00006

768

12

12

4

true

true

true

42

Hello, I'm a language model,

524288

true

20

50304

pre-training

gpt2-periodicity-fix

715

0.1

10612.93912

0.34485

19072

49400.82988

3.05505

0.00006

0.29316

0.30392

3.06392

Finished

-

garg-aayush

6d ago

1h 50m 19s

-

64

1024

/workspace/ckpt

/workspace/shards

-

cuda

250

989000000000000

1

1

0.0006

32

19073

0.00006

768

12

12

4

true

true

true

42

Hello, I'm a language model,

524288

true

20

50304

pre-training

gpt2-baseline

715

0.1

10666.78572

0.34441

19072

49151.45142

3.08455

0.00006

0.31972

0.30263

3.06575

1-6

of 6