Skip to main content

Kojima-takeshi188's group workspace

Timestamps visible
2024-05-14 04:45:21
    accelerator.backward(loss)
2024-05-14 04:45:21
  File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 1847, in backward
2024-05-14 04:45:21
    self.deepspeed_engine_wrapped.backward(loss, **kwargs)
2024-05-14 04:45:21
  File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/deepspeed.py", line 176, in backward
2024-05-14 04:45:21
    self.engine.step()
2024-05-14 04:45:21
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/engine.py", line 2087, in step
2024-05-14 04:45:21
    self._take_model_step(lr_kwargs)
2024-05-14 04:45:21
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/engine.py", line 1994, in _take_model_step
2024-05-14 04:45:21
    self.optimizer.step()
2024-05-14 04:45:21
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 1662, in step
2024-05-14 04:45:21
    self._update_scale(self.overflow)
2024-05-14 04:45:21
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 1908, in _update_scale
2024-05-14 04:45:21
    self.loss_scaler.update_scale(has_overflow)
2024-05-14 04:45:21
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/fp16/loss_scaler.py", line 175, in update_scale
2024-05-14 04:45:21
    raise Exception(
2024-05-14 04:45:21
Exception: Current loss scale already at minimum - cannot decrease scale anymore. Exiting run.
2024-05-14 04:45:22
Running Validation...
2024-05-14 04:45:22
[2024-05-14 13:45:19,069] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 2, reducing to 1