Skip to main content
yipingwanguw
Projects
verl_few_shot
Log in
Sign up
Overview
Workspace
Runs
Automat.
Sweeps
Reports
Artifacts
Yipingwang22's workspace
Personal workspace
Automated workspace
Changes are only visible to you.
Runs
21
Name
2 visualized
dpsk_distill_1.5b, 1-shot RLVR with pi_1
dpsk_distill_1.5b, 1-shot RLVR with pi_1
dpsk_distill_1.5b, 16-shot RLVR with pi_{1},..., pi_{16}
dpsk_distill_1.5b, 16-shot RLVR with pi_{1},..., pi_{16}
dpsk_distill_1.5b, 4-shot RLVR with pi_{1}, pi_{2}, pi_{13}, pi_{1209}}
dpsk_distill_1.5b, 4-shot RLVR with pi_{1}, pi_{2}, pi_{13}, pi_{1209}}
dpsk_distill_1.5b, RLVR with DSR-sub
dpsk_distill_1.5b, RLVR with DSR-sub
Qwen2.5-Math-1.5B, 1-shot RLVR with pi_{13}
Qwen2.5-Math-1.5B, 1-shot RLVR with pi_{13}
Qwen2.5-Math-1.5B, 1-shot RLVR with pi_{13}
Qwen2.5-Math-1.5B, 1-shot RLVR with pi_{13}
Qwen2.5-Math-1.5B, 1-shot RLVR with pi_{13}
Qwen2.5-Math-1.5B, 1-shot RLVR with pi_{13}
Qwen2.5-Math-1.5B, 1-shot RLVR with pi_{1}
Qwen2.5-Math-1.5B, 1-shot RLVR with pi_{1}
Qwen2.5-Math-1.5B, 1-shot RLVR with pi_{1}
Qwen2.5-Math-1.5B, 1-shot RLVR with pi_{1}
Qwen2.5-Math-1.5B, 2-shot RLVR with pi_{1}, pi_{13}
Qwen2.5-Math-1.5B, 2-shot RLVR with pi_{1}, pi_{13}
Qwen2.5-Math-1.5B, RLVR with DSR-sub
Qwen2.5-Math-1.5B, RLVR with DSR-sub
Qwen2.5-Math-1.5B, RLVR with DSR-sub
Qwen2.5-Math-1.5B, RLVR with DSR-sub
Qwen2.5-Math-1.5B-pi1_r128
Qwen2.5-Math-1.5B-pi1_r128
Qwen2.5-Math-7B, 1-shot RLVR with pi_1
Qwen2.5-Math-7B, 1-shot RLVR with pi_1
Qwen2.5-Math-7B, 1-shot RLVR with pi_1
Qwen2.5-Math-7B, 1-shot RLVR with pi_1
Qwen2.5-Math-7B, 16-shot RLVR with pi_{1},..., pi_{16}
Qwen2.5-Math-7B, 16-shot RLVR with pi_{1},..., pi_{16}
Qwen2.5-Math-7B, 2-shot RLVR with pi_{1}, pi_{13}
Qwen2.5-Math-7B, 2-shot RLVR with pi_{1}, pi_{13}
Qwen2.5-Math-7B, 2-shot RLVR with pi_{1}, pi_{13}
Qwen2.5-Math-7B, 2-shot RLVR with pi_{1}, pi_{13}
Qwen2.5-Math-7B, 4-shot RLVR with pi_{1}, pi_{2}, pi_{13}, pi_{1209}}
Qwen2.5-Math-7B, 4-shot RLVR with pi_{1}, pi_{2}, pi_{13}, pi_{1209}}
Qwen2.5-Math-7B, RLVR with DSR-sub
Qwen2.5-Math-7B, RLVR with DSR-sub
1-20
of 21
actor/ppo_kl
actor/ppo_kl
500
1k
1.5k
2k
Step
-2
-1
0
1
2
Qwen2.5-Math-1.5B, 1-shot RLVR with pi_{1}
Qwen2.5-Math-1.5B, 1-shot RLVR with pi_{1}
Previous
Next