Skip to main content

Atmallen8's group workspace

Timestamps visible
2022-09-17 11:17:38
[2022-09-17 11:17:37,871] [INFO] [engine.py:1818:_save_zero_checkpoint] zero checkpoint saved /fsx/hailey/pythia/ckpts/1.3B-dedup-largeBS/global_step70250/zero_pp_rank_104_mp_rank_00_optim_states.pt
2022-09-17 11:25:48
[2022-09-17 11:25:46,718] [INFO] [stage1.py:697:step] [deepspeed] fp16 dynamic loss scale overflow! Skipping step. Attempted loss scale: 32768.0, reducing to 16384.0
2022-09-17 11:30:25
[2022-09-17 11:30:23,723] [INFO] [engine.py:1805:_copy_recovery_script] creating recovery script /fsx/hailey/pythia/ckpts/1.3B-dedup-largeBS/zero_to_fp32.py
2022-09-17 11:30:25
[2022-09-17 11:30:23,923] [INFO] [engine.py:1818:_save_zero_checkpoint] zero checkpoint saved /fsx/hailey/pythia/ckpts/1.3B-dedup-largeBS/global_step70500/zero_pp_rank_104_mp_rank_00_optim_states.pt
2022-09-17 11:43:15
[2022-09-17 11:43:13,796] [INFO] [engine.py:1805:_copy_recovery_script] creating recovery script /fsx/hailey/pythia/ckpts/1.3B-dedup-largeBS/zero_to_fp32.py
2022-09-17 11:43:15
[2022-09-17 11:43:13,835] [INFO] [engine.py:1818:_save_zero_checkpoint] zero checkpoint saved /fsx/hailey/pythia/ckpts/1.3B-dedup-largeBS/global_step70750/zero_pp_rank_104_mp_rank_00_optim_states.pt
2022-09-17 11:55:59
[2022-09-17 11:55:59,439] [INFO] [engine.py:1805:_copy_recovery_script] creating recovery script /fsx/hailey/pythia/ckpts/1.3B-dedup-largeBS/zero_to_fp32.py
2022-09-17 11:55:59
[2022-09-17 11:55:59,519] [INFO] [engine.py:1818:_save_zero_checkpoint] zero checkpoint saved /fsx/hailey/pythia/ckpts/1.3B-dedup-largeBS/global_step71000/zero_pp_rank_104_mp_rank_00_optim_states.pt
2022-09-17 12:08:47
[2022-09-17 12:08:46,705] [INFO] [engine.py:1805:_copy_recovery_script] creating recovery script /fsx/hailey/pythia/ckpts/1.3B-dedup-largeBS/zero_to_fp32.py
2022-09-17 12:08:47
[2022-09-17 12:08:46,729] [INFO] [engine.py:1818:_save_zero_checkpoint] zero checkpoint saved /fsx/hailey/pythia/ckpts/1.3B-dedup-largeBS/global_step71250/zero_pp_rank_104_mp_rank_00_optim_states.pt
2022-09-17 12:09:03
[2022-09-17 12:09:02,079] [INFO] [engine.py:1805:_copy_recovery_script] creating recovery script /fsx/hailey/pythia/ckpts/1.3B-dedup-largeBS/zero_to_fp32.py
2022-09-17 12:09:03
[2022-09-17 12:09:02,111] [INFO] [engine.py:1818:_save_zero_checkpoint] zero checkpoint saved /fsx/hailey/pythia/ckpts/1.3B-dedup-largeBS/global_step71250/zero_pp_rank_104_mp_rank_00_optim_states.pt