Skip to main content

Eleutherai-oslo's group workspace

Timestamps visible
2023-04-19 16:02:11
    model, optimizer, lr_scheduler = setup_model_and_optimizer(
2023-04-19 16:02:11
  File "/fsx/polyglot.train/gpt-neox/megatron/training.py", line 440, in setup_model_and_optimizer
2023-04-19 16:02:11
    optimizer, param_groups = get_optimizer(model=model, neox_args=neox_args)
2023-04-19 16:02:11
  File "/fsx/polyglot.train/gpt-neox/megatron/training.py", line 382, in get_optimizer
2023-04-19 16:02:11
    optimizer = adam_optimizer(
2023-04-19 16:02:11
  File "/fsx/gpt-neox/conda/envs/improved-t5/lib/python3.9/site-packages/deepspeed/ops/adam/fused_adam.py", line 72, in __init__
2023-04-19 16:02:11
    fused_adam_cuda = FusedAdamBuilder().load()
2023-04-19 16:02:11
  File "/fsx/gpt-neox/conda/envs/improved-t5/lib/python3.9/site-packages/deepspeed/ops/op_builder/builder.py", line 215, in load
2023-04-19 16:02:11
    return self.jit_load(verbose)
2023-04-19 16:02:11
  File "/fsx/gpt-neox/conda/envs/improved-t5/lib/python3.9/site-packages/deepspeed/ops/op_builder/builder.py", line 230, in jit_load
2023-04-19 16:02:11
    assert_no_cuda_mismatch()
2023-04-19 16:02:11
  File "/fsx/gpt-neox/conda/envs/improved-t5/lib/python3.9/site-packages/deepspeed/ops/op_builder/builder.py", line 59, in assert_no_cuda_mismatch
2023-04-19 16:02:11
    raise Exception(
2023-04-19 16:02:11
Exception: Installed CUDA version 11.8 does not match the version torch was compiled with 11.7, unable to compile cuda/cpp extensions without a matching cuda version.
2023-04-19 16:02:11
Configuring Optimizer type: Adam with params: {'lr': 0.0001, 'betas': [0.9, 0.95], 'eps': 1e-08}
2023-04-19 16:02:11
WARNING: APEX not installed - defaulting to deepspeed's fused adam