Skip to main content
rx31
Projects
SpuriousRewardRLVR
Log in
Sign up
Overview
Workspace
Runs
Automat.
Sweeps
Reports
Artifacts
Rx31's workspace
Personal workspace
Automated workspace
Changes are only visible to you.
Runs
5
Name
5 visualized
Random Reward (qwen2.5_math_7b-DeepScaleR-RLVR-random0.5-lr5e-7-kl0.00)
Random Reward (qwen2.5_math_7b-DeepScaleR-RLVR-random0.5-lr5e-7-kl0.00)
Format Reward (qwen2.5_math_7b-DeepScaleR-RLVR-box_only_format-lr5e-7-kl0.00)
Format Reward (qwen2.5_math_7b-DeepScaleR-RLVR-box_only_format-lr5e-7-kl0.00)
Incorrect Label (qwen2.5_math_7b-DeepScaleR_mv_labeled_qwen2.5_math_7b_incorrect-RLVR-math-lr5e-7-kl0.00)
Incorrect Label (qwen2.5_math_7b-DeepScaleR_mv_labeled_qwen2.5_math_7b_incorrect-RLVR-math-lr5e-7-kl0.00)
Majority Vote (qwen2.5_math_7b-DeepScaleR_mv_labeled_qwen2.5_math_7b-RLVR-math-lr5e-7-kl0.00)
Majority Vote (qwen2.5_math_7b-DeepScaleR_mv_labeled_qwen2.5_math_7b-RLVR-math-lr5e-7-kl0.00)
Ground Truth (qwen2.5_math_7b-DeepScaleR-RLVR-math-lr5e-7-kl0.00)
Ground Truth (qwen2.5_math_7b-DeepScaleR-RLVR-math-lr5e-7-kl0.00)
1-5
of 5
Add panels
eval
17
1-8 of 17
MATH500 avg@1
MATH500 avg@1
0
50
100
150
200
eval/global_step
0.55
0.6
0.65
0.7
0.75
AMC avg@8
AMC avg@8
0
50
100
150
200
eval/global_step
0.3
0.35
0.4
0.45
0.5
0.55
0.6
AIME2024 avg@8
AIME2024 avg@8
0
50
100
150
200
eval/global_step
0.1
0.15
0.2
0.25
AIME2025 avg@8
AIME2025 avg@8
0
50
100
150
200
eval/global_step
0.04
0.06
0.08
0.1
0.12
0.14
MATH500 Code Frequency avg@1
MATH500 Code Frequency avg@1
0
50
100
150
200
eval/global_step
0.65
0.7
0.75
0.8
0.85
0.9
AMC Code Frequency avg@8
AMC Code Frequency avg@8
0
50
100
150
200
eval/global_step
0.65
0.7
0.75
0.8
0.85
0.9
0.95
AIME2024 Code Frequency avg@8
AIME2024 Code Frequency avg@8
0
50
100
150
200
eval/global_step
0.6
0.7
0.8
0.9
AIME2025 Code Frequency avg@8
AIME2025 Code Frequency avg@8
0
50
100
150
200
eval/global_step
0.5
0.6
0.7
0.8
0.9
train
16
1-6 of 16
System
29
1-6 of 29
Add section