Skip to main content

Kojima-takeshi188's group workspace

Timestamps visible
2024-03-01 11:54:54
    self._update_scale(self.overflow)
2024-03-01 11:54:54
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 1908, in _update_scale
2024-03-01 11:54:54
    self.loss_scaler.update_scale(has_overflow)
2024-03-01 11:54:54
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/fp16/loss_scaler.py", line 175, in update_scale
2024-03-01 11:54:54
    raise Exception(
2024-03-01 11:54:54
Exception: Current loss scale already at minimum - cannot decrease scale anymore. Exiting run.
2024-03-01 11:54:54
[2024-03-01 20:54:51,707] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 128, reducing to 64
2024-03-01 11:54:54
[2024-03-01 20:54:51,880] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 64, reducing to 32
2024-03-01 11:54:54
[2024-03-01 20:54:52,042] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 32, reducing to 16
2024-03-01 11:54:54
[2024-03-01 20:54:52,203] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 16, reducing to 8
2024-03-01 11:54:54
[2024-03-01 20:54:52,364] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 8, reducing to 4
2024-03-01 11:54:54
[2024-03-01 20:54:52,526] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 4, reducing to 2
2024-03-01 11:54:54
[2024-03-01 20:54:52,687] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 2, reducing to 1