Skip to main content

Upup-ashton-wang's group workspace

Resa - Main Models

What makes this group special?
Tags

Resa-DeepScaleR-v3

Notes
State
Finished
Start time
August 7th, 2025 7:19:15 AM
Runtime
23m
Tracked hours
-
Run path
upup-ashton-wang-usc/Resa/fp7ukf74
OS
Linux-4.18.0-553.22.1.el8_10.x86_64-x86_64-with-glibc2.28
Python version
CPython 3.10.16
Command
/home1/shangsha/workspace/reasoning/reasoning-sae/./scripts/train/sae_tuning.py --config ./recipes/DeepSeek-R1-Distill-Qwen-1.5B/grpo/sae_tuning.yaml --base_model_name DeepSeek-R1-Distill-Qwen-1.5B --source_model_post_train_dataset_name still --source_model_post_train_type grpo --source_model_checkpoint checkpoint-0 --sae_name sae-DeepSeek-R1-Distill-Qwen-1.5B-65k --sae_hookpoint model.layers.12 --trigger_dataset_name deepscaler --sae_type trained_from_scratch --target_model_name DeepSeek-R1-Distill-Qwen-1.5B --elicitation_dataset_name deepscaler
System Hardware
CPU count64
Logical CPU count 64
GPU count2
GPU typeNVIDIA L40S
W&B CLI Version
0.19.9
Config

Config parameters are your model's inputs. Learn more

  • {} 20 keys
    • "DeepSeek-R1-Distill-Qwen-1.5B"
    • 1
    • "deepscaler"
    • 0.000001
    • 1
    • 128
    • 0.05
    • 32
    • [] 7 items
      • 2
      • "model.layers.12"
      • "sae-DeepSeek-R1-Distill-Qwen-1.5B-65k"
      • "trained_from_scratch"
      • 500
      • 42
      • "checkpoint-0"
      • "still"
      • "grpo"
      • "DeepSeek-R1-Distill-Qwen-1.5B"
      • "deepscaler"
    Summary

    Summary metrics are your model's outputs. Learn more

    • {} 5 keys
      • 1
      • 200.14176346356916
      • 0.000001
      • 4.3125
      • 138
    Artifact Outputs

    This run produced these artifacts as outputs. Total: 1. Learn more

    Loading...