Skip to main content

Chilli's group workspace

Timestamps visible
2023-08-03 08:01:51
 %backward: 65.49632803353117
2023-08-03 08:01:51
[2023-08-03 08:01:50,519] [INFO] [logging.py:60:log_dist] [Rank 0] rank=0 time (ms) | train_batch: 0.00 | batch_input: 16.22 | forward: 1297.50 | backward_microstep: 3725.72 | backward: 3725.69 | backward_inner_microstep: 3725.64 | backward_inner: 3725.61 | backward_allreduce_microstep: 0.03 | backward_allreduce: 0.01 | reduce_tied_grads: 0.03 | comms: 504.26 | reduce_grads: 318.17 | step: 282.11 | _step_clipping: 0.01 | _step_step: 281.67 | _step_zero_grad: 0.06 | _step_check_overflow: 0.29
2023-08-03 08:01:51
 samples/sec: 179.635 | iteration       12/  143000 | elapsed time per iteration (ms): 5700.4 | learning rate: 1.343E-06 | approx flops per GPU: 153.6TFLOPS | lm_loss: 9.554676E+00 | loss scale: 4096.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
2023-08-03 08:01:51
time (ms)
2023-08-03 08:01:58
[2023-08-03 08:01:56,190] [INFO] [logging.py:60:log_dist] [Rank 0] step=13, skipped=0, lr=[1.4545454545454546e-06, 1.4545454545454546e-06], mom=[[0.9, 0.95], [0.9, 0.95]]
2023-08-03 08:01:58
steps: 13 loss: 9.4703 iter time (s): 5.661 samples/sec: 180.872
2023-08-03 08:01:58
%comms: 8.318295328461737
2023-08-03 08:01:58
 %optimizer_step 4.5510461216813125
2023-08-03 08:01:58
 %forward: 23.002565448326823
2023-08-03 08:01:58
 %backward: 65.80612683082296
2023-08-03 08:01:58
[2023-08-03 08:01:56,192] [INFO] [logging.py:60:log_dist] [Rank 0] rank=0 time (ms) | train_batch: 0.00 | batch_input: 23.25 | forward: 1302.28 | backward_microstep: 3725.62 | backward: 3725.60 | backward_inner_microstep: 3725.54 | backward_inner: 3725.51 | backward_allreduce_microstep: 0.03 | backward_allreduce: 0.01 | reduce_tied_grads: 0.03 | comms: 470.94 | reduce_grads: 304.07 | step: 257.66 | _step_clipping: 0.01 | _step_step: 257.09 | _step_zero_grad: 0.06 | _step_check_overflow: 0.40
2023-08-03 08:01:58
 samples/sec: 180.585 | iteration       13/  143000 | elapsed time per iteration (ms): 5670.4 | learning rate: 1.455E-06 | approx flops per GPU: 154.4TFLOPS | lm_loss: 9.470345E+00 | loss scale: 4096.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
2023-08-03 08:01:58
time (ms)