Skip to main content

Upup-ashton-wang's group workspace

Timestamps visible
2025-03-31 14:10:04
[2025-03-31 07:10:04,664] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved /project/neiswang_1391/shangsha/reasoning/reasoning-sae/ckpts/models/DeepSeek-R1-Distill-Qwen-1.5B/grpo_curated_lima/checkpoint-360/global_step360/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt.
2025-03-31 14:10:04
[2025-03-31 07:10:04,676] [INFO] [engine.py:3645:_save_zero_checkpoint] zero checkpoint saved /project/neiswang_1391/shangsha/reasoning/reasoning-sae/ckpts/models/DeepSeek-R1-Distill-Qwen-1.5B/grpo_curated_lima/checkpoint-360/global_step360/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt
2025-03-31 14:10:04
[2025-03-31 07:10:04,676] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step360 is ready now!
2025-03-31 14:10:04
[INFO|trainer.py:2657] 2025-03-31 07:10:04,724 >>
2025-03-31 14:10:04

2025-03-31 14:10:04
Training completed. Do not forget to share your model on huggingface.co/models =)
2025-03-31 14:10:04

2025-03-31 14:10:04

2025-03-31 14:10:04
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 360/360 [12:21:38<00:00, 123.61s/it]
2025-03-31 14:10:04
{'train_runtime': 44499.367, 'train_samples_per_second': 0.259, 'train_steps_per_second': 0.008, 'train_loss': 3.3997018676975205e-06, 'epoch': 2.08}
2025-03-31 14:10:04
***** train metrics *****
2025-03-31 14:10:04
  total_flos               =         0GF
2025-03-31 14:10:04
  train_loss               =         0.0
2025-03-31 14:10:04
  train_runtime            = 12:21:39.36
2025-03-31 14:10:04
  train_samples_per_second =       0.259
2025-03-31 14:10:04
  train_steps_per_second   =       0.008