Skip to main content

Preetham-gali's group workspace

Timestamps visible
2021-09-02 18:44:27
    loss = model.train_batch(data_iter=data_iterator)
2021-09-02 18:44:27
  File "/home/mchorse/.local/lib/python3.8/site-packages/deepspeed/runtime/pipe/engine.py", line 305, in train_batch
2021-09-02 18:44:27
    self._exec_schedule(sched)
2021-09-02 18:44:27
  File "/home/mchorse/.local/lib/python3.8/site-packages/deepspeed/runtime/pipe/engine.py", line 1306, in _exec_schedule
2021-09-02 18:44:27
    self._exec_instr(**cmd.kwargs)
2021-09-02 18:44:27
  File "/home/mchorse/.local/lib/python3.8/site-packages/deepspeed/runtime/pipe/engine.py", line 1119, in _exec_optimizer_step
2021-09-02 18:44:27
    self._take_model_step(lr_kwargs)
2021-09-02 18:44:27
  File "/home/mchorse/.local/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1167, in _take_model_step
2021-09-02 18:44:27
    self.optimizer.step(comms_timer=self.timers('comms'))
2021-09-02 18:44:27
  File "/home/mchorse/.local/lib/python3.8/site-packages/deepspeed/runtime/zero/stage1.py", line 691, in step
2021-09-02 18:44:27
    self._update_scale(self.overflow)
2021-09-02 18:44:27
  File "/home/mchorse/.local/lib/python3.8/site-packages/deepspeed/runtime/zero/stage1.py", line 809, in _update_scale
2021-09-02 18:44:27
    self.loss_scaler.update_scale(has_overflow)
2021-09-02 18:44:27
  File "/home/mchorse/.local/lib/python3.8/site-packages/deepspeed/runtime/fp16/loss_scaler.py", line 155, in update_scale
2021-09-02 18:44:27
    raise Exception("Current loss scale already at minimum - cannot decrease scale anymore. Exiting "
2021-09-02 18:44:27
Exception: Current loss scale already at minimum - cannot decrease scale anymore. Exiting run.