Skip to main content

Atmallen8's group workspace

Timestamps visible
2022-11-22 04:38:30
[2022-11-22 04:38:28,944] [INFO] [engine.py:1805:_copy_recovery_script] creating recovery script /fsx/hailey/pythia/ckpts/2.7B_deduped_newer/zero_to_fp32.py
2022-11-22 04:38:30
[2022-11-22 04:38:28,951] [INFO] [engine.py:1818:_save_zero_checkpoint] zero checkpoint saved /fsx/hailey/pythia/ckpts/2.7B_deduped_newer/global_step141000/zero_pp_rank_32_mp_rank_00_optim_states.pt
2022-11-22 04:44:27
[2022-11-22 04:44:25,382] [INFO] [stage1.py:697:step] [deepspeed] fp16 dynamic loss scale overflow! Skipping step. Attempted loss scale: 65536.0, reducing to 65536.0
2022-11-22 04:58:12
[2022-11-22 04:58:11,400] [INFO] [stage1.py:697:step] [deepspeed] fp16 dynamic loss scale overflow! Skipping step. Attempted loss scale: 65536.0, reducing to 32768.0
2022-11-22 06:19:16
[2022-11-22 06:19:15,304] [INFO] [engine.py:1805:_copy_recovery_script] creating recovery script /fsx/hailey/pythia/ckpts/2.7B_deduped_newer/zero_to_fp32.py
2022-11-22 06:19:16
[2022-11-22 06:19:15,313] [INFO] [engine.py:1818:_save_zero_checkpoint] zero checkpoint saved /fsx/hailey/pythia/ckpts/2.7B_deduped_newer/global_step142000/zero_pp_rank_32_mp_rank_00_optim_states.pt
2022-11-22 06:59:53
[2022-11-22 06:59:52,154] [INFO] [stage1.py:697:step] [deepspeed] fp16 dynamic loss scale overflow! Skipping step. Attempted loss scale: 65536.0, reducing to 65536.0
2022-11-22 06:59:59
[2022-11-22 06:59:57,962] [INFO] [stage1.py:697:step] [deepspeed] fp16 dynamic loss scale overflow! Skipping step. Attempted loss scale: 65536.0, reducing to 32768.0
2022-11-22 07:59:59
[2022-11-22 07:59:58,115] [INFO] [engine.py:1805:_copy_recovery_script] creating recovery script /fsx/hailey/pythia/ckpts/2.7B_deduped_newer/zero_to_fp32.py
2022-11-22 07:59:59
[2022-11-22 07:59:58,126] [INFO] [engine.py:1818:_save_zero_checkpoint] zero checkpoint saved /fsx/hailey/pythia/ckpts/2.7B_deduped_newer/global_step143000/zero_pp_rank_32_mp_rank_00_optim_states.pt
2022-11-22 08:00:31
[2022-11-22 08:00:31,282] [INFO] [engine.py:1805:_copy_recovery_script] creating recovery script /fsx/hailey/pythia/ckpts/2.7B_deduped_newer/zero_to_fp32.py
2022-11-22 08:00:31
[2022-11-22 08:00:31,293] [INFO] [engine.py:1818:_save_zero_checkpoint] zero checkpoint saved /fsx/hailey/pythia/ckpts/2.7B_deduped_newer/global_step143000/zero_pp_rank_32_mp_rank_00_optim_states.pt