Skip to main content

Eleutherai-oslo's group workspace

Timestamps visible
2022-11-15 16:11:25
%comms: 7.81802698648795
2022-11-15 16:11:25
 %optimizer_step 0.05621542625580694
2022-11-15 16:11:25
 %forward: 24.38650879852948
2022-11-15 16:11:25
 %backward: 64.51560601916827
2022-11-15 16:11:25
[2022-11-15 16:11:24,286] [INFO] [logging.py:60:log_dist] [Rank 0] rank=0 time (ms) | train_batch: 0.00 | batch_input: 2061.80 | forward: 20478.46 | backward_microstep: 54178.19 | backward: 54176.68 | backward_inner_microstep: 54172.18 | backward_inner: 54170.59 | backward_allreduce_microstep: 2.23 | backward_allreduce: 0.80 | reduce_tied_grads: 0.27 | comms: 6565.15 | reduce_grads: 6564.87 | step: 47.21 | _step_clipping: 0.12 | _step_step: 44.64 | _step_zero_grad: 1.38 | _step_check_overflow: 0.36
2022-11-15 16:11:33
[2022-11-15 16:11:32,222] [INFO] [stage1.py:697:step] [deepspeed] fp16 dynamic loss scale overflow! Skipping step. Attempted loss scale: 8388608.0, reducing to 4194304.0
2022-11-15 16:11:41
[2022-11-15 16:11:40,173] [INFO] [stage1.py:697:step] [deepspeed] fp16 dynamic loss scale overflow! Skipping step. Attempted loss scale: 4194304.0, reducing to 2097152.0
2022-11-15 16:11:49
[2022-11-15 16:11:48,107] [INFO] [stage1.py:697:step] [deepspeed] fp16 dynamic loss scale overflow! Skipping step. Attempted loss scale: 2097152.0, reducing to 1048576.0
2022-11-15 16:11:57
[2022-11-15 16:11:56,056] [INFO] [stage1.py:697:step] [deepspeed] fp16 dynamic loss scale overflow! Skipping step. Attempted loss scale: 1048576.0, reducing to 524288.0
2022-11-15 16:12:05
[2022-11-15 16:12:03,998] [INFO] [stage1.py:697:step] [deepspeed] fp16 dynamic loss scale overflow! Skipping step. Attempted loss scale: 524288.0, reducing to 262144.0
2022-11-15 16:12:13
[2022-11-15 16:12:11,974] [INFO] [stage1.py:697:step] [deepspeed] fp16 dynamic loss scale overflow! Skipping step. Attempted loss scale: 262144.0, reducing to 131072.0