Skip to main content

Eleutherai-oslo's group workspace

Timestamps visible
2023-04-20 01:28:33
[2023-04-20 01:28:31,838] [INFO] [distributed.py:46:init_distributed] Initializing torch distributed with backend: nccl
2023-04-20 01:28:33
make: Entering directory '/fsx/polyglot.train/gpt-neox/megatron/data'
2023-04-20 01:28:33
make: Nothing to be done for 'default'.
2023-04-20 01:28:33
make: Leaving directory '/fsx/polyglot.train/gpt-neox/megatron/data'
2023-04-20 01:28:33
FAILED !!!
2023-04-20 01:28:34
Traceback (most recent call last):
2023-04-20 01:28:34
  File "/fsx/polyglot.train/gpt-neox/train.py", line 27, in <module>
2023-04-20 01:28:34
    pretrain(neox_args=neox_args)
2023-04-20 01:28:34
  File "/fsx/polyglot.train/gpt-neox/megatron/training.py", line 104, in pretrain
2023-04-20 01:28:34
    model, optimizer, lr_scheduler = setup_model_and_optimizer(
2023-04-20 01:28:34
  File "/fsx/polyglot.train/gpt-neox/megatron/training.py", line 440, in setup_model_and_optimizer
2023-04-20 01:28:34
    optimizer, param_groups = get_optimizer(model=model, neox_args=neox_args)
2023-04-20 01:28:34
UnboundLocalError: local variable 'model' referenced before assignment