Skip to main content

Eleutherai-oslo's group workspace

Timestamps visible
2022-09-17 11:17:42
[2022-09-17 11:17:39,604] [INFO] [stage1.py:697:step] [deepspeed] fp16 dynamic loss scale overflow! Skipping step. Attempted loss scale: 4096.0, reducing to 4096.0
2022-09-17 11:17:48
[2022-09-17 11:17:45,052] [INFO] [stage1.py:697:step] [deepspeed] fp16 dynamic loss scale overflow! Skipping step. Attempted loss scale: 4096.0, reducing to 2048.0
2022-09-17 11:19:20
[2022-09-17 11:19:20,481] [INFO] [engine.py:1805:_copy_recovery_script] creating recovery script /fsx/multi-lingual-6b/gpt-neox/checkpoints/3B_95000/zero_to_fp32.py
2022-09-17 11:19:23
[2022-09-17 11:19:20,881] [INFO] [engine.py:1818:_save_zero_checkpoint] zero checkpoint saved /fsx/multi-lingual-6b/gpt-neox/checkpoints/3B_95000/global_step113000/zero_pp_rank_24_mp_rank_00_optim_states.pt
2022-09-17 11:41:18
[2022-09-17 11:41:15,531] [INFO] [stage1.py:697:step] [deepspeed] fp16 dynamic loss scale overflow! Skipping step. Attempted loss scale: 2048.0, reducing to 1024.0
2022-09-17 12:53:36
[2022-09-17 12:53:36,176] [INFO] [engine.py:1805:_copy_recovery_script] creating recovery script /fsx/multi-lingual-6b/gpt-neox/checkpoints/3B_95000/zero_to_fp32.py
2022-09-17 12:53:38
[2022-09-17 12:53:36,372] [INFO] [engine.py:1818:_save_zero_checkpoint] zero checkpoint saved /fsx/multi-lingual-6b/gpt-neox/checkpoints/3B_95000/global_step114000/zero_pp_rank_24_mp_rank_00_optim_states.pt
2022-09-17 14:27:54
[2022-09-17 14:27:51,815] [INFO] [engine.py:1805:_copy_recovery_script] creating recovery script /fsx/multi-lingual-6b/gpt-neox/checkpoints/3B_95000/zero_to_fp32.py
2022-09-17 14:27:54
[2022-09-17 14:27:51,826] [INFO] [engine.py:1818:_save_zero_checkpoint] zero checkpoint saved /fsx/multi-lingual-6b/gpt-neox/checkpoints/3B_95000/global_step115000/zero_pp_rank_24_mp_rank_00_optim_states.pt
2022-09-17 14:49:57
[2022-09-17 14:49:54,683] [INFO] [stage1.py:697:step] [deepspeed] fp16 dynamic loss scale overflow! Skipping step. Attempted loss scale: 4096.0, reducing to 4096.0
2022-09-17 14:50:03
[2022-09-17 14:50:00,111] [INFO] [stage1.py:697:step] [deepspeed] fp16 dynamic loss scale overflow! Skipping step. Attempted loss scale: 4096.0, reducing to 2048.0
2022-09-17 14:55:57
[2022-09-17 14:55:53,800] [INFO] [stage1.py:697:step] [deepspeed] fp16 dynamic loss scale overflow! Skipping step. Attempted loss scale: 2048.0, reducing to 1024.0