Skip to main content

Atmallen8's group workspace

Timestamps visible
2022-09-17 11:42:13
[2022-09-17 11:42:12,914] [INFO] [engine.py:1805:_copy_recovery_script] creating recovery script /fsx/hailey/pythia/ckpts/1.3B-largeBS/zero_to_fp32.py
2022-09-17 11:42:13
[2022-09-17 11:42:12,937] [INFO] [engine.py:1818:_save_zero_checkpoint] zero checkpoint saved /fsx/hailey/pythia/ckpts/1.3B-largeBS/global_step71250/zero_pp_rank_0_mp_rank_00_optim_states.pt
2022-09-17 11:42:21
---------------------------------------------------------------------------------------------------------------------------
2022-09-17 11:42:21
 validation results at the end of training for val data | lm_loss value: 1.988313E+00 | lm_loss_ppl value: 7.303201E+00 |
2022-09-17 11:42:21
---------------------------------------------------------------------------------------------------------------------------
2022-09-17 11:42:29
[2022-09-17 11:42:27,999] [INFO] [logging.py:60:log_dist] [Rank 0] Saving model checkpoint: /fsx/hailey/pythia/ckpts/1.3B-largeBS/global_step71250/mp_rank_00_model_states.pt
2022-09-17 11:42:29
[2022-09-17 11:42:28,365] [INFO] [engine.py:1805:_copy_recovery_script] creating recovery script /fsx/hailey/pythia/ckpts/1.3B-largeBS/zero_to_fp32.py
2022-09-17 11:42:29
[2022-09-17 11:42:28,444] [INFO] [engine.py:1818:_save_zero_checkpoint] zero checkpoint saved /fsx/hailey/pythia/ckpts/1.3B-largeBS/global_step71250/zero_pp_rank_0_mp_rank_00_optim_states.pt
2022-09-17 11:42:35
Evaluating iter 10/10
2022-09-17 11:42:37
----------------------------------------------------------------------------------------------------------------------------
2022-09-17 11:42:37
 validation results at the end of training for test data | lm_loss value: 1.979522E+00 | lm_loss_ppl value: 7.239280E+00 |
2022-09-17 11:42:37
----------------------------------------------------------------------------------------------------------------------------