Skip to main content

Preetham-gali's group workspace

Timestamps visible
2021-09-02 16:53:07
 samples/sec: 89.426 | iteration      200/  250000 | elapsed time per iteration (ms): 5814.9 | learning rate: 2.052E-05 | approx flops per GPU: 125.2TFLOPS | loss: 9.995406E+00 | lm_loss: 9.547047E+00 | kld_loss: 4.483623E-01 | mse_loss: 0.000000E+00 | loss scale: 16.0 | number of skipped iterations:  10 | number of nan iterations:  10 |
2021-09-02 16:53:07
time (ms)
2021-09-02 16:53:13
[2021-09-02 16:53:12,273] [INFO] [stage1.py:695:step] [deepspeed] fp16 dynamic loss scale overflow! Skipping step. Attempted loss scale: 16.0, reducing to 8.0
2021-09-02 16:53:19
[2021-09-02 16:53:17,961] [INFO] [stage1.py:695:step] [deepspeed] fp16 dynamic loss scale overflow! Skipping step. Attempted loss scale: 8.0, reducing to 4.0
2021-09-02 16:53:29
[2021-09-02 16:53:29,411] [INFO] [stage1.py:695:step] [deepspeed] fp16 dynamic loss scale overflow! Skipping step. Attempted loss scale: 4.0, reducing to 2.0
2021-09-02 16:53:48
[2021-09-02 16:53:46,609] [INFO] [stage1.py:695:step] [deepspeed] fp16 dynamic loss scale overflow! Skipping step. Attempted loss scale: 2.0, reducing to 1.0
2021-09-02 16:54:04
[2021-09-02 16:54:03,858] [INFO] [logging.py:60:log_dist] [Rank 0] step=210, skipped=33, lr=[2.1239999999999997e-05, 2.1239999999999997e-05], mom=[[0.9, 0.999], [0.9, 0.999]]
2021-09-02 16:54:04
steps: 210 loss: 9.8110 iter time (s): 5.726 samples/sec: 90.817
2021-09-02 16:54:04
%comms: 2.676491266278921
2021-09-02 16:54:04
 %optimizer_step 0.7711798505281412
2021-09-02 16:54:04
 %forward: 44.04209475061005
2021-09-02 16:54:04
 %backward: 39.007011530686725
2021-09-02 16:54:04
[2021-09-02 16:54:03,858] [INFO] [logging.py:60:log_dist] [Rank 0] rank=0 time (ms) | train_batch: 0.00 | batch_input: 157.09 | forward: 25217.69 | backward_microstep: 22337.56 | backward: 22334.70 | backward_inner_microstep: 22326.04 | backward_inner: 22322.08 | backward_allreduce_microstep: 4.22 | backward_allreduce: 1.45 | reduce_tied_grads: 0.39 | comms: 1532.51 | reduce_grads: 1198.55 | step: 441.56 | _step_clipping: 0.11 | _step_step: 439.74 | _step_zero_grad: 0.51 | _step_check_overflow: 0.54