Skip to main content

Preetham-gali's group workspace

Timestamps visible
2021-08-17 11:44:41
 %backward: 74.89246340358466
2021-08-17 11:44:41
[2021-08-17 11:44:40,342] [INFO] [logging.py:60:log_dist] [Rank 0] rank=0 time (ms) | train_batch: 0.00 | batch_input: 268.19 | forward: 29371.64 | backward_microstep: 102980.95 | backward: 102974.29 | backward_inner_microstep: 102951.78 | backward_inner: 102944.18 | backward_allreduce_microstep: 11.84 | backward_allreduce: 4.06 | reduce_tied_grads: 0.44 | comms: 353.06 | reduce_grads: 300.43 | step: 408.20 | _step_clipping: 0.11 | _step_step: 406.13 | _step_zero_grad: 0.80 | _step_check_overflow: 0.55
2021-08-17 11:46:57
[2021-08-17 11:46:57,891] [INFO] [logging.py:60:log_dist] [Rank 0] step=1000, skipped=18, lr=[9.20625e-05], mom=[[0.9, 0.999]]
2021-08-17 11:46:57
steps: 1000 loss: 3.6874 iter time (s): 13.754 samples/sec: 39.262
2021-08-17 11:46:57
%comms: 0.2610759490516577
2021-08-17 11:46:57
 %optimizer_step 0.2958599806630684
2021-08-17 11:46:57
 %forward: 21.351871790196995
2021-08-17 11:46:57
 %backward: 74.8878157428386
2021-08-17 11:46:57
[2021-08-17 11:46:57,892] [INFO] [logging.py:60:log_dist] [Rank 0] rank=0 time (ms) | train_batch: 0.00 | batch_input: 280.15 | forward: 29366.53 | backward_microstep: 103004.51 | backward: 102997.77 | backward_inner_microstep: 102975.14 | backward_inner: 102967.50 | backward_allreduce_microstep: 11.81 | backward_allreduce: 4.07 | reduce_tied_grads: 0.41 | comms: 359.07 | reduce_grads: 307.32 | step: 406.91 | _step_clipping: 0.11 | _step_step: 404.87 | _step_zero_grad: 0.81 | _step_check_overflow: 0.52
2021-08-17 11:46:57
 samples/sec: 39.275 | iteration     1000/  320000 | elapsed time per iteration (ms): 13749.4 | learning rate: 9.206E-05 | approx flops per GPU: 109.6TFLOPS | loss: 3.759967E+00 | loss scale: 32768.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
2021-08-17 11:46:57
time (ms)
2021-08-17 11:47:32
------------------------------------------------------------------------------------------
2021-08-17 11:47:32
 validation loss at iteration 1000 | loss value: 3.665482E+00 | loss PPL: 3.907495E+01 |
2021-08-17 11:47:32
------------------------------------------------------------------------------------------