Upup-ashton-wang's group workspace
Source Model Ablation (Trained-from-Scatch SAE)
What makes this group special?
Tags
Resa-STILL-Tina-3000-step (Trained-from-Scratch SAE)
Notes
Author
State
Finished
Start time
June 1st, 2025 9:50:21 AM
Runtime
18m 23s
Tracked hours
18m 22s
Run path
upup-ashton-wang-usc/Resa/8vmie0yi
OS
Linux-5.15.0-92-generic-x86_64-with-glibc2.35
Python version
CPython 3.10.16
Command
/home/omer/shangshang/workspace/reasoning/reasoning-sae/./scripts/train/run_sae_based_distill.py --config ./recipes/DeepSeek-R1-Distill-Qwen-1.5B/grpo/distill_curated_still.yaml --host_model_checkpoint checkpoint-3000 --student_model_name DeepSeek-R1-Distill-Qwen-1.5B --distill_dataset_name curated_still --distill_type sft_r1_distill --sae_hookpoint model.layers.12 --sae_type reason_pretrained
System Hardware
CPU count | 32 |
Logical CPU count | 64 |
GPU count | 8 |
GPU type | NVIDIA RTX 6000 Ada Generation |
W&B CLI Version
0.19.8
Config
Config parameters are your model's inputs. Learn more
- {} 20 keys▶
- "DeepSeek-R1-Distill-Qwen-1.5B"
- 1
- "curated_still"
- "sft_r1_distill"
- "checkpoint-3000"
- "curated_still"
- "grpo"
- 0.000001
- 1
- 128
- 0.05
- 32
- [] 7 items▶
- 2
- "model.layers.12"
- "sae-DeepSeek-R1-Distill-Qwen-1.5B-65k"
- "reason_pretrained"
- 500
- 42
- "DeepSeek-R1-Distill-Qwen-1.5B"
Summary
Summary metrics are your model's outputs. Learn more
- {} 5 keys▶
- 1
- 188.6456310679612
- 0.000001
- 2.21875
- 258
Artifact Outputs
This run produced these artifacts as outputs. Total: 1. Learn more
Type
Name
Consumer count
Loading...