Skip to main content

Eleutherai-oslo's group workspace

Timestamps visible
2022-09-12 16:09:25
[2022-09-12 16:09:24,396] [INFO] [engine.py:1805:_copy_recovery_script] creating recovery script /fsx/multi-lingual-6b/gpt-neox/checkpoints/1B_scratch/zero_to_fp32.py
2022-09-12 16:09:25
[2022-09-12 16:09:24,664] [INFO] [engine.py:1818:_save_zero_checkpoint] zero checkpoint saved /fsx/multi-lingual-6b/gpt-neox/checkpoints/1B_scratch/global_step189000/zero_pp_rank_104_mp_rank_00_optim_states.pt
2022-09-12 16:16:34
[2022-09-12 16:16:31,936] [INFO] [stage1.py:697:step] [deepspeed] fp16 dynamic loss scale overflow! Skipping step. Attempted loss scale: 64.0, reducing to 32.0
2022-09-12 16:23:35
[2022-09-12 16:23:35,467] [INFO] [engine.py:1805:_copy_recovery_script] creating recovery script /fsx/multi-lingual-6b/gpt-neox/checkpoints/1B_scratch/zero_to_fp32.py
2022-09-12 16:23:37
[2022-09-12 16:23:35,754] [INFO] [engine.py:1818:_save_zero_checkpoint] zero checkpoint saved /fsx/multi-lingual-6b/gpt-neox/checkpoints/1B_scratch/global_step189500/zero_pp_rank_104_mp_rank_00_optim_states.pt
2022-09-12 16:25:25
[2022-09-12 16:25:24,877] [INFO] [stage1.py:697:step] [deepspeed] fp16 dynamic loss scale overflow! Skipping step. Attempted loss scale: 32.0, reducing to 16.0
2022-09-12 16:37:47
[2022-09-12 16:37:45,616] [INFO] [engine.py:1805:_copy_recovery_script] creating recovery script /fsx/multi-lingual-6b/gpt-neox/checkpoints/1B_scratch/zero_to_fp32.py
2022-09-12 16:37:47
[2022-09-12 16:37:45,905] [INFO] [engine.py:1818:_save_zero_checkpoint] zero checkpoint saved /fsx/multi-lingual-6b/gpt-neox/checkpoints/1B_scratch/global_step190000/zero_pp_rank_104_mp_rank_00_optim_states.pt
2022-09-12 16:52:01
[2022-09-12 16:52:00,948] [INFO] [engine.py:1805:_copy_recovery_script] creating recovery script /fsx/multi-lingual-6b/gpt-neox/checkpoints/1B_scratch/zero_to_fp32.py
2022-09-12 16:52:01
[2022-09-12 16:52:01,041] [INFO] [engine.py:1818:_save_zero_checkpoint] zero checkpoint saved /fsx/multi-lingual-6b/gpt-neox/checkpoints/1B_scratch/global_step190500/zero_pp_rank_104_mp_rank_00_optim_states.pt