Skip to main content

Atmallen8's group workspace

Timestamps visible
2022-09-15 00:23:21
[2022-09-15 00:23:20,736] [INFO] [engine.py:1805:_copy_recovery_script] creating recovery script /fsx/hailey/pythia/ckpts/125M-dedup-largeBS/zero_to_fp32.py
2022-09-15 00:23:21
[2022-09-15 00:23:20,796] [INFO] [engine.py:1818:_save_zero_checkpoint] zero checkpoint saved /fsx/hailey/pythia/ckpts/125M-dedup-largeBS/global_step70500/zero_pp_rank_24_mp_rank_00_optim_states.pt
2022-09-15 00:27:57
[2022-09-15 00:27:56,955] [INFO] [engine.py:1805:_copy_recovery_script] creating recovery script /fsx/hailey/pythia/ckpts/125M-dedup-largeBS/zero_to_fp32.py
2022-09-15 00:27:57
[2022-09-15 00:27:57,016] [INFO] [engine.py:1818:_save_zero_checkpoint] zero checkpoint saved /fsx/hailey/pythia/ckpts/125M-dedup-largeBS/global_step70750/zero_pp_rank_24_mp_rank_00_optim_states.pt
2022-09-15 00:32:33
[2022-09-15 00:32:32,773] [INFO] [engine.py:1805:_copy_recovery_script] creating recovery script /fsx/hailey/pythia/ckpts/125M-dedup-largeBS/zero_to_fp32.py
2022-09-15 00:32:33
[2022-09-15 00:32:32,810] [INFO] [engine.py:1818:_save_zero_checkpoint] zero checkpoint saved /fsx/hailey/pythia/ckpts/125M-dedup-largeBS/global_step71000/zero_pp_rank_24_mp_rank_00_optim_states.pt
2022-09-15 00:36:51
[2022-09-15 00:36:50,302] [INFO] [stage1.py:697:step] [deepspeed] fp16 dynamic loss scale overflow! Skipping step. Attempted loss scale: 32768.0, reducing to 32768.0
2022-09-15 00:36:53
[2022-09-15 00:36:51,373] [INFO] [stage1.py:697:step] [deepspeed] fp16 dynamic loss scale overflow! Skipping step. Attempted loss scale: 32768.0, reducing to 16384.0
2022-09-15 00:37:09
[2022-09-15 00:37:08,694] [INFO] [engine.py:1805:_copy_recovery_script] creating recovery script /fsx/hailey/pythia/ckpts/125M-dedup-largeBS/zero_to_fp32.py
2022-09-15 00:37:09
[2022-09-15 00:37:08,742] [INFO] [engine.py:1818:_save_zero_checkpoint] zero checkpoint saved /fsx/hailey/pythia/ckpts/125M-dedup-largeBS/global_step71250/zero_pp_rank_24_mp_rank_00_optim_states.pt
2022-09-15 00:37:15
[2022-09-15 00:37:15,024] [INFO] [engine.py:1805:_copy_recovery_script] creating recovery script /fsx/hailey/pythia/ckpts/125M-dedup-largeBS/zero_to_fp32.py
2022-09-15 00:37:15
[2022-09-15 00:37:15,073] [INFO] [engine.py:1818:_save_zero_checkpoint] zero checkpoint saved /fsx/hailey/pythia/ckpts/125M-dedup-largeBS/global_step71250/zero_pp_rank_24_mp_rank_00_optim_states.pt