Skip to main content

Preetham-gali's group workspace

Timestamps visible
2021-10-02 16:03:43
steps: 530 loss: 17.3997 iter time (s): 39.796 samples/sec: 13.569
2021-10-02 16:03:43
%comms: 0.19496476265044171
2021-10-02 16:03:43
 %optimizer_step 0.15753340507347366
2021-10-02 16:03:43
 %forward: 49.67971193735536
2021-10-02 16:03:43
 %backward: 36.86526403961709
2021-10-02 16:03:43
[2021-10-02 16:03:43,188] [INFO] [logging.py:60:log_dist] [Rank 0] rank=0 time (ms) | train_batch: 0.00 | batch_input: 524.33 | forward: 197706.90 | backward_microstep: 146720.74 | backward: 146710.13 | backward_inner_microstep: 146677.21 | backward_inner: 146665.56 | backward_allreduce_microstep: 16.43 | backward_allreduce: 5.70 | reduce_tied_grads: 0.53 | comms: 775.89 | reduce_grads: 560.48 | step: 626.92 | _step_clipping: 0.20 | _step_step: 622.98 | _step_zero_grad: 1.39 | _step_check_overflow: 1.19
2021-10-02 16:10:24
[2021-10-02 16:10:21,433] [INFO] [logging.py:60:log_dist] [Rank 0] step=540, skipped=18, lr=[6.264e-05, 6.264e-05], mom=[[0.9, 0.999], [0.9, 0.999]]
2021-10-02 16:10:24
steps: 540 loss: 17.3551 iter time (s): 39.820 samples/sec: 13.561
2021-10-02 16:10:24
%comms: 0.1949968451697898
2021-10-02 16:10:24
 %optimizer_step 0.15673647212859304
2021-10-02 16:10:24
 %forward: 49.69180844682131
2021-10-02 16:10:24
 %backward: 36.85016684110144
2021-10-02 16:10:24
[2021-10-02 16:10:21,434] [INFO] [logging.py:60:log_dist] [Rank 0] rank=0 time (ms) | train_batch: 0.00 | batch_input: 536.30 | forward: 197874.34 | backward_microstep: 146749.09 | backward: 146738.52 | backward_inner_microstep: 146704.16 | backward_inner: 146692.28 | backward_allreduce_microstep: 17.47 | backward_allreduce: 6.06 | reduce_tied_grads: 0.58 | comms: 776.48 | reduce_grads: 560.91 | step: 624.13 | _step_clipping: 0.21 | _step_step: 620.44 | _step_zero_grad: 1.26 | _step_check_overflow: 0.91