eleutherai-oslo

Eleutherai-oslo's group workspace

Group: GiCKo7teg9xF2eXeo2H8bk_2bsfbndm

1-2

of 2

Timestamps visible

2023-04-19 17:34:10

[2023-04-19 17:34:08,458] [INFO] [distributed.py:46:init_distributed] Initializing torch distributed with backend: nccl

2023-04-19 17:34:10

make: Entering directory '/fsx/polyglot.train/gpt-neox/megatron/data'

2023-04-19 17:34:10

make: Nothing to be done for 'default'.

2023-04-19 17:34:10

make: Leaving directory '/fsx/polyglot.train/gpt-neox/megatron/data'

2023-04-19 17:34:10

WARNING: APEX not installed - defaulting to deepspeed's fused adam

2023-04-19 17:34:12

Using /fsx/polyglot.train/torch_extensions/ as PyTorch extensions root...

2023-04-19 17:34:12

Detected CUDA files, patching ldflags

2023-04-19 17:34:12

Emitting ninja build file /fsx/polyglot.train/torch_extensions/fused_adam/build.ninja...

2023-04-19 17:34:12

Building extension module fused_adam...

2023-04-19 17:34:12

Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)

2023-04-19 17:34:12

Loading extension module fused_adam...

2023-04-19 17:34:12

/fsx/gpt-neox/conda/envs/improved-t5/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead

2023-04-19 17:34:12

warnings.warn(

2023-04-19 17:34:12

ninja: no work to do.

2023-04-19 17:34:12

Time to load fused_adam op: 0.39587974548339844 seconds

2023-04-19 17:34:12

[2023-04-19 17:34:10,574] [WARNING] [config.py:77:_sanity_check] DeepSpeedConfig: cpu_offload is deprecated. Please use offload_optimizer.