Skip to main content

Eleutherai-oslo's group workspace

Timestamps visible
2023-04-20 01:44:26
MPU MP: [4, 5, 6, 7]
2023-04-20 01:44:26
MPU MP: [8, 9, 10, 11]
2023-04-20 01:44:26
MPU MP: [12, 13, 14, 15]
2023-04-20 01:44:26
> setting random seeds to 1234 ...
2023-04-20 01:44:26
[2023-04-20 01:44:25,685] [INFO] [checkpointing.py:223:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234
2023-04-20 01:44:26
make: Entering directory '/fsx/polyglot.train/gpt-neox/megatron/data'
2023-04-20 01:44:26
make: Nothing to be done for 'default'.
2023-04-20 01:44:26
make: Leaving directory '/fsx/polyglot.train/gpt-neox/megatron/data'
2023-04-20 01:44:26
building GPT2 model ...
2023-04-20 01:44:26
FAILED !!!
2023-04-20 01:44:27
Traceback (most recent call last):
2023-04-20 01:44:27
  File "/fsx/polyglot.train/gpt-neox/train.py", line 27, in <module>
2023-04-20 01:44:27
    pretrain(neox_args=neox_args)
2023-04-20 01:44:27
  File "/fsx/polyglot.train/gpt-neox/megatron/training.py", line 104, in pretrain
2023-04-20 01:44:27
    model, optimizer, lr_scheduler = setup_model_and_optimizer(
2023-04-20 01:44:27
  File "/fsx/polyglot.train/gpt-neox/megatron/training.py", line 440, in setup_model_and_optimizer
2023-04-20 01:44:27
    optimizer, param_groups = get_optimizer(model=model, neox_args=neox_args)
2023-04-20 01:44:27
UnboundLocalError: local variable 'model' referenced before assignment