upup-ashton-wang-usc

Upup-ashton-wang's group workspace

Group: Source Model Ablation (Trained-from-Scatch SAE)

464

1-6

of 6

Timestamps visible

2025-06-01 10:08:43

"tie_word_embeddings": false,

2025-06-01 10:08:43

"torch_dtype": "bfloat16",

2025-06-01 10:08:43

"transformers_version": "4.50.0",

2025-06-01 10:08:43

"use_cache": true,

2025-06-01 10:08:43

"use_mrope": false,

2025-06-01 10:08:43

"use_sliding_window": false,

2025-06-01 10:08:43

"vocab_size": 151936

2025-06-01 10:08:43

}

2025-06-01 10:08:43

tokenizer config file saved in /home/omer/shangshang/project/reasoning/reasoning-sae/ckpts/models/DeepSeek-R1-Distill-Qwen-1.5B/grpo_curated_still/checkpoint-3000/distill/curated_still/pretrain/model.layers.12/sft_r1_distill/tokenizer_config.json

2025-06-01 10:08:43

Special tokens file saved in /home/omer/shangshang/project/reasoning/reasoning-sae/ckpts/models/DeepSeek-R1-Distill-Qwen-1.5B/grpo_curated_still/checkpoint-3000/distill/curated_still/pretrain/model.layers.12/sft_r1_distill/special_tokens_map.json

2025-06-01 10:08:43

Final model saved to /home/omer/shangshang/project/reasoning/reasoning-sae/ckpts/models/DeepSeek-R1-Distill-Qwen-1.5B/grpo_curated_still/checkpoint-3000/distill/curated_still/pretrain/model.layers.12/sft_r1_distill

2025-06-01 10:08:43

Training finished.