Skip to main content

Preetham-gali's group workspace

Timestamps visible
2021-09-07 21:20:32
steps: 232960 loss: 4.1932 iter time (s): 4.149 samples/sec: 38.566
2021-09-07 21:20:32
%comms: 0.8631095136372842
2021-09-07 21:20:32
 %optimizer_step 0.5713453059508065
2021-09-07 21:20:32
 %forward: 50.927182899723256
2021-09-07 21:20:32
 %backward: 31.572261992986828
2021-09-07 21:20:32
[2021-09-07 21:20:30,089] [INFO] [logging.py:60:log_dist] [Rank 0] rank=0 time (ms) | train_batch: 0.00 | batch_input: 226.92 | forward: 21128.44 | backward_microstep: 13102.60 | backward: 13098.56 | backward_inner_microstep: 13086.77 | backward_inner: 13082.59 | backward_allreduce_microstep: 5.83 | backward_allreduce: 2.00 | reduce_tied_grads: 0.30 | comms: 358.08 | reduce_grads: 325.25 | step: 237.04 | _step_clipping: 0.12 | _step_step: 235.21 | _step_zero_grad: 0.47 | _step_check_overflow: 0.59
2021-09-07 21:21:12
[2021-09-07 21:21:11,805] [INFO] [logging.py:60:log_dist] [Rank 0] step=232970, skipped=299, lr=[4.632701762326269e-06, 4.632701762326269e-06], mom=[[0.9, 0.999], [0.9, 0.999]]
2021-09-07 21:21:12
steps: 232970 loss: 4.1701 iter time (s): 4.170 samples/sec: 38.374
2021-09-07 21:21:12
%comms: 0.8637151706121668
2021-09-07 21:21:12
 %optimizer_step 0.57595745131733
2021-09-07 21:21:12
 %forward: 50.65697157970237
2021-09-07 21:21:12
 %backward: 31.39650713422838
2021-09-07 21:21:12
[2021-09-07 21:21:11,806] [INFO] [logging.py:60:log_dist] [Rank 0] rank=0 time (ms) | train_batch: 0.00 | batch_input: 217.02 | forward: 21121.46 | backward_microstep: 13094.90 | backward: 13090.80 | backward_inner_microstep: 13078.87 | backward_inner: 13074.70 | backward_allreduce_microstep: 5.94 | backward_allreduce: 2.05 | reduce_tied_grads: 0.30 | comms: 360.13 | reduce_grads: 325.32 | step: 240.15 | _step_clipping: 0.12 | _step_step: 238.24 | _step_zero_grad: 0.48 | _step_check_overflow: 0.65