Skip to main content

Eleutherai-oslo's group workspace

Timestamps visible
2023-04-19 17:34:10
[2023-04-19 17:34:08,458] [INFO] [distributed.py:46:init_distributed] Initializing torch distributed with backend: nccl
2023-04-19 17:34:10
make: Entering directory '/fsx/polyglot.train/gpt-neox/megatron/data'
2023-04-19 17:34:10
make: Nothing to be done for 'default'.
2023-04-19 17:34:10
make: Leaving directory '/fsx/polyglot.train/gpt-neox/megatron/data'
2023-04-19 17:34:10
WARNING: APEX not installed - defaulting to deepspeed's fused adam
2023-04-19 17:34:12
Using /fsx/polyglot.train/torch_extensions/ as PyTorch extensions root...
2023-04-19 17:34:12
Detected CUDA files, patching ldflags
2023-04-19 17:34:12
Emitting ninja build file /fsx/polyglot.train/torch_extensions/fused_adam/build.ninja...
2023-04-19 17:34:12
Building extension module fused_adam...
2023-04-19 17:34:12
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
2023-04-19 17:34:12
Loading extension module fused_adam...
2023-04-19 17:34:12
/fsx/gpt-neox/conda/envs/improved-t5/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead
2023-04-19 17:34:12
  warnings.warn(
2023-04-19 17:34:12
ninja: no work to do.
2023-04-19 17:34:12
Time to load fused_adam op: 0.39587974548339844 seconds
2023-04-19 17:34:12
[2023-04-19 17:34:10,574] [WARNING] [config.py:77:_sanity_check] DeepSpeedConfig: cpu_offload is deprecated. Please use offload_optimizer.