Skip to main content

Eleutherai-oslo's group workspace

Timestamps visible
2022-11-17 00:58:45
[2022-11-17 00:58:45,581] [INFO] [engine.py:1805:_copy_recovery_script] creating recovery script /fsx/multi-lingual-6b/gpt-neox/checkpoints/13B_scratch/zero_to_fp32.py
2022-11-17 00:58:45
[2022-11-17 00:58:45,585] [INFO] [engine.py:1818:_save_zero_checkpoint] zero checkpoint saved /fsx/multi-lingual-6b/gpt-neox/checkpoints/13B_scratch/global_step18000/zero_pp_rank_2_mp_rank_00_optim_states.pt
2022-11-17 01:02:46
[2022-11-17 01:02:44,036] [INFO] [stage1.py:697:step] [deepspeed] fp16 dynamic loss scale overflow! Skipping step. Attempted loss scale: 262144.0, reducing to 262144.0
2022-11-17 01:24:05
[2022-11-17 01:24:03,872] [INFO] [stage1.py:697:step] [deepspeed] fp16 dynamic loss scale overflow! Skipping step. Attempted loss scale: 262144.0, reducing to 131072.0
2022-11-17 02:46:23
[2022-11-17 02:46:21,656] [INFO] [engine.py:1805:_copy_recovery_script] creating recovery script /fsx/multi-lingual-6b/gpt-neox/checkpoints/13B_scratch/zero_to_fp32.py
2022-11-17 02:46:23
[2022-11-17 02:46:21,661] [INFO] [engine.py:1818:_save_zero_checkpoint] zero checkpoint saved /fsx/multi-lingual-6b/gpt-neox/checkpoints/13B_scratch/global_step19000/zero_pp_rank_2_mp_rank_00_optim_states.pt
2022-11-17 03:33:40
[2022-11-17 03:33:38,923] [INFO] [stage1.py:697:step] [deepspeed] fp16 dynamic loss scale overflow! Skipping step. Attempted loss scale: 262144.0, reducing to 262144.0
2022-11-17 03:38:03
[2022-11-17 03:38:00,841] [INFO] [stage1.py:697:step] [deepspeed] fp16 dynamic loss scale overflow! Skipping step. Attempted loss scale: 262144.0, reducing to 131072.0
2022-11-17 03:53:06
[2022-11-17 03:53:04,447] [INFO] [stage1.py:697:step] [deepspeed] fp16 dynamic loss scale overflow! Skipping step. Attempted loss scale: 131072.0, reducing to 65536.0
2022-11-17 04:34:00
[2022-11-17 04:33:58,556] [INFO] [engine.py:1805:_copy_recovery_script] creating recovery script /fsx/multi-lingual-6b/gpt-neox/checkpoints/13B_scratch/zero_to_fp32.py
2022-11-17 04:34:00
[2022-11-17 04:33:58,561] [INFO] [engine.py:1818:_save_zero_checkpoint] zero checkpoint saved /fsx/multi-lingual-6b/gpt-neox/checkpoints/13B_scratch/global_step20000/zero_pp_rank_2_mp_rank_00_optim_states.pt