Upup-ashton-wang's group workspace
Algorithm ablation
What makes this group special?
Tags
DeepScaleR-CoT-based-SFT
Notes
Author
State
Finished
Start time
May 17th, 2025 5:21:50 AM
Runtime
3h 14m 7s
Tracked hours
-
Run path
upup-ashton-wang-usc/Resa/q0sf8w9y
OS
Linux-4.18.0-553.22.1.el8_10.x86_64-x86_64-with-glibc2.28
Python version
CPython 3.10.16
Command
/home1/shangsha/workspace/reasoning/reasoning-sae/./resee/post_train_hf/sft.py --config ./recipes/DeepSeek-R1-Distill-Qwen-1.5B/sft/model_curated_deepscaler.yaml
System Hardware
CPU count | 64 |
Logical CPU count | 64 |
GPU count | 2 |
GPU type | NVIDIA L40S |
W&B CLI Version
0.19.9
Group
Algorithm ablationConfig
Config parameters are your model's inputs. Learn more
- {} 215 keys▶
- true
- "/project/neiswang_1391/shangsha/reasoning/reasoning-sae/ckpts/models/DeepSeek-R1-Distill-Qwen-1.5B/base"
- {} 6 keys▶
- false
- 0.9
- 0.999
- 0.00000001
- false
- [] 1 item▶
- "Qwen2ForCausalLM"
- 0
- false
- false
- null
- false
- null
- true
- false
- 151,643
- "<CHARS_PER_TOKEN>"
- 0
- null
- null
- false
- 0
- false
- true
- null
- null
- null
- null
- "text"
- null
- null
- null
- null
- 1,800
- [] 0 items
- null
- null
- false
- 0
- true
- false
- false
- false
- false
- 151,936
- 0.05
- 0
- 0
46 ... 95▶▶96 ... 145▶▶146 ... 195▶▶196 ... 210▶▶
Summary
Summary metrics are your model's outputs. Learn more
- {} 12 keys▶
- "table-file"
- 633,150,436,145,102,848
- 0.9246609556138832
- 11,641.2394
- 10.389
- 0.649
- 3
- 7,560
- 1.4775604009628296
- 0.00000000139236981342
- 0.936
- 0.7757900953292847
Artifact Outputs
This run produced these artifacts as outputs. Total: 1. Learn more
Type
Name
Consumer count
Loading...