Skip to main content
rx31
Projects
SpuriousRewardRLVR
Workspace
Log in
Sign up
Overview
Workspace
Runs
Automat.
Sweeps
Reports
Artifacts
Rx31's workspace
Personal workspace
Automated workspace
Changes are only visible to you.
Runs
5
Name
5 visualized
Random Reward (qwen2.5_math_7b-DeepScaleR-RLVR-random0.5-lr5e-7-kl0.00)
Random Reward (qwen2.5_math_7b-DeepScaleR-RLVR-random0.5-lr5e-7-kl0.00)
Format Reward (qwen2.5_math_7b-DeepScaleR-RLVR-box_only_format-lr5e-7-kl0.00)
Format Reward (qwen2.5_math_7b-DeepScaleR-RLVR-box_only_format-lr5e-7-kl0.00)
Incorrect Label (qwen2.5_math_7b-DeepScaleR_mv_labeled_qwen2.5_math_7b_incorrect-RLVR-math-lr5e-7-kl0.00)
Incorrect Label (qwen2.5_math_7b-DeepScaleR_mv_labeled_qwen2.5_math_7b_incorrect-RLVR-math-lr5e-7-kl0.00)
Majority Vote (qwen2.5_math_7b-DeepScaleR_mv_labeled_qwen2.5_math_7b-RLVR-math-lr5e-7-kl0.00)
Majority Vote (qwen2.5_math_7b-DeepScaleR_mv_labeled_qwen2.5_math_7b-RLVR-math-lr5e-7-kl0.00)
Ground Truth (qwen2.5_math_7b-DeepScaleR-RLVR-math-lr5e-7-kl0.00)
Ground Truth (qwen2.5_math_7b-DeepScaleR-RLVR-math-lr5e-7-kl0.00)
1-5
of 5
AIME2025 Code Frequency avg@8
AIME2025 Code Frequency avg@8
0
50
100
150
200
eval/global_step
0.5
0.6
0.7
0.8
0.9
Random Reward (qwen2.5_math_7b-DeepScaleR-RLVR-random0.5-lr5e-7-kl0.00)
Format Reward (qwen2.5_math_7b-DeepScaleR-RLVR-box_only_format-lr5e-7-kl0.00)
Incorrect Label (qwen2.5_math_7b-DeepScaleR_mv_labeled_qwen2.5_math_7b_incorrect-RLVR-math-lr5e-7-kl0.00)
Majority Vote (qwen2.5_math_7b-DeepScaleR_mv_labeled_qwen2.5_math_7b-RLVR-math-lr5e-7-kl0.00)
Ground Truth (qwen2.5_math_7b-DeepScaleR-RLVR-math-lr5e-7-kl0.00)
Previous
Next