Skip to main content

Igoro's group workspace

Timestamps visible
2022-02-27 05:26:04
make: Entering directory '/home/mchorse/gpt-neox/megatron/data'
2022-02-27 05:26:04
make: Nothing to be done for 'default'.
2022-02-27 05:26:04
make: Leaving directory '/home/mchorse/gpt-neox/megatron/data'
2022-02-27 05:26:06
[2022-02-27 05:26:05,142] [WARNING] [config.py:77:_sanity_check] DeepSpeedConfig: cpu_offload is deprecated. Please use offload_optimizer.
2022-02-27 05:26:08
Using /home/mchorse/.cache/torch_extensions as PyTorch extensions root...
2022-02-27 05:26:10
Loading extension module utils...
2022-02-27 05:26:10
Time to load utils op: 0.6037061214447021 seconds
2022-02-27 05:26:10
[2022-02-27 05:26:08,362] [INFO] [stage1.py:160:__init__] ZeRO Elastic Checkpoint = True
2022-02-27 05:26:12
Using /home/mchorse/.cache/torch_extensions as PyTorch extensions root...
2022-02-27 05:26:12
No modifications detected for re-loaded extension module utils, skipping build step...
2022-02-27 05:26:12
Loading extension module utils...
2022-02-27 05:26:12
Time to load utils op: 0.0011687278747558594 seconds
2022-02-27 05:26:26
[2022-02-27 05:26:25,159] [INFO] [engine.py:1551:_load_checkpoint] rank: 32 loading checkpoint: /mnt/ssd-1/20B_P3/global_step151000/mp_rank_04_model_states.pt
2022-02-27 05:28:15
successfully loaded 6 ZeRO state_dicts for rank 32
2022-02-27 05:28:25
loading 6 zero partition checkpoints for rank 32
2022-02-27 10:38:08
[2022-02-27 10:38:06,594] [INFO] [engine.py:1805:_copy_recovery_script] creating recovery script /mnt/ssd-1/20B_P3/zero_to_fp32.py
2022-02-27 10:38:08
[2022-02-27 10:38:06,602] [INFO] [engine.py:1818:_save_zero_checkpoint] zero checkpoint saved /mnt/ssd-1/20B_P3/global_step151500/zero_pp_rank_4_mp_rank_04_optim_states.pt