Skip to main content

Upup-ashton-wang's group workspace

Timestamps visible
2025-08-07 07:42:14
  "transformers_version": "4.51.1",
2025-08-07 07:42:14
  "use_cache": true,
2025-08-07 07:42:14
  "use_mrope": false,
2025-08-07 07:42:14
  "use_sliding_window": false,
2025-08-07 07:42:14
  "vocab_size": 151936
2025-08-07 07:42:14
}
2025-08-07 07:42:14

2025-08-07 07:42:14
tokenizer config file saved in /project/neiswang_1391/shangsha/reasoning/reasoning-sae/ckpts/models/DeepSeek-R1-Distill-Qwen-1.5B/sae_tuning_deepscaler/DeepSeek-R1-Distill-Qwen-1.5B_grpo_still_checkpoint-0/trained_from_scratch_deepscaler_model.layers.12/tokenizer_config.json
2025-08-07 07:42:14
Special tokens file saved in /project/neiswang_1391/shangsha/reasoning/reasoning-sae/ckpts/models/DeepSeek-R1-Distill-Qwen-1.5B/sae_tuning_deepscaler/DeepSeek-R1-Distill-Qwen-1.5B_grpo_still_checkpoint-0/trained_from_scratch_deepscaler_model.layers.12/special_tokens_map.json
2025-08-07 07:42:14
Final model saved to /project/neiswang_1391/shangsha/reasoning/reasoning-sae/ckpts/models/DeepSeek-R1-Distill-Qwen-1.5B/sae_tuning_deepscaler/DeepSeek-R1-Distill-Qwen-1.5B_grpo_still_checkpoint-0/trained_from_scratch_deepscaler_model.layers.12
2025-08-07 07:42:14
Training finished.