Skip to main content
cjreinforce
Projects
openrlhf_train_ppo
Log in
Sign up
Overview
Workspace
Runs
Automat.
Sweeps
Reports
Artifacts
Chrisjina's workspace
Personal workspace
Automated workspace
Changes are only visible to you.
Runs
59
Name
19 visualized
ppo_0225T01:08
ppo_0225T01:08
VR_deepseek_1.5B_deepscaler-ds_VR_coef_5.0_deepseek_PRM_k3-kl
VR_deepseek_1.5B_deepscaler-ds_VR_coef_5.0_deepseek_PRM_k3-kl
PRMVR_deepseek_1.5B_deepscaler-ds_VR_coef_5.0_deepseek_PRM_k3-kl
PRMVR_deepseek_1.5B_deepscaler-ds_VR_coef_5.0_deepseek_PRM_k3-kl
VR_roll_bs_64_lr_5e-7_kl_coef_1e-3_t_0.5_VR_coef_1.0_fix_template_bug
VR_roll_bs_64_lr_5e-7_kl_coef_1e-3_t_0.5_VR_coef_1.0_fix_template_bug
PRM_roll_bs_64_lr_5e-7_kl_coef_1e-3_t_0.5_fix_template_bug_hacking_KL
PRM_roll_bs_64_lr_5e-7_kl_coef_1e-3_t_0.5_fix_template_bug_hacking_KL
PRMVR_roll_bs_64_lr_5e-7_kl_coef_1e-3_t_0.5_VR_coef_1.0_800_ds_fix_template_bug
PRMVR_roll_bs_64_lr_5e-7_kl_coef_1e-3_t_0.5_VR_coef_1.0_800_ds_fix_template_bug
PRMVR_roll_bs_64_kl_coef_1e-3_t_0.5_new_80k_ds_fix_template_bug
PRMVR_roll_bs_64_kl_coef_1e-3_t_0.5_new_80k_ds_fix_template_bug
PRMVR_roll_bs_64_lr_5e-7_kl_coef_1e-3_t_0.5_VR_coef_0.1_fix_template_bug
PRMVR_roll_bs_64_lr_5e-7_kl_coef_1e-3_t_0.5_VR_coef_0.1_fix_template_bug
PRMVR_roll_bs_64_lr_3e-7_kl_coef_1e-3_t_0.5_VR_coef_1
PRMVR_roll_bs_64_lr_3e-7_kl_coef_1e-3_t_0.5_VR_coef_1
PRMVR_roll_bs_64_kl_coef_1e-3_t_0.5_new_80k_ds
PRMVR_roll_bs_64_kl_coef_1e-3_t_0.5_new_80k_ds
PRMVR_roll_bs_64_kl_coef_1e-3_t_0.5_VR_coef_1
PRMVR_roll_bs_64_kl_coef_1e-3_t_0.5_VR_coef_1
VR_roll_bs_32_kl_coef_1e-3_t_0.5
VR_roll_bs_32_kl_coef_1e-3_t_0.5
PRMVR_roll_bs_32_kl_coef_1e-2_t_0.6
PRMVR_roll_bs_32_kl_coef_1e-2_t_0.6
PRMVR_roll_bs_32_kl_coef_1e-3_t_0.6_baseline_step_hacking
PRMVR_roll_bs_32_kl_coef_1e-3_t_0.6_baseline_step_hacking
PRM_roll_bs_32_kl_coef_1e-3_t_0.6_hacking
PRM_roll_bs_32_kl_coef_1e-3_t_0.6_hacking
VR_roll_bs_32_kl_coef_1e-3_t_0.6
VR_roll_bs_32_kl_coef_1e-3_t_0.6
PRMVR_vllm_roll_bs_128_kl_coef_1e-3_t_0.6
PRMVR_vllm_roll_bs_128_kl_coef_1e-3_t_0.6
PRM+VR_vllm_rollout_bs_256_kl_coef_1e-3_t_0.6_hacking_non-specified_separator
PRM+VR_vllm_rollout_bs_256_kl_coef_1e-3_t_0.6_hacking_non-specified_separator
PRM+VR_vllm_roll_bs_32_kl_coef_1e-3_bug_in_lr_sched
PRM+VR_vllm_roll_bs_32_kl_coef_1e-3_bug_in_lr_sched
PRM+VR_vllm_rollout_bs_256_kl_coef_1e-3_hacking_non-specified_separator
PRM+VR_vllm_rollout_bs_256_kl_coef_1e-3_hacking_non-specified_separator
PRM+VR_rollout_bs_32_kl_coef_1e-4
PRM+VR_rollout_bs_32_kl_coef_1e-4
PRM+VR_rollout_bs_32_kl_coef_1e-3
PRM+VR_rollout_bs_32_kl_coef_1e-3
PRM_kl_coef_1e-3_rollout_bs_32_hacking
PRM_kl_coef_1e-3_rollout_bs_32_hacking
PRM_kl_coef_1e-2_rollout_bs_32
PRM_kl_coef_1e-2_rollout_bs_32
1-24
of 24
Add panels
train
14
1-14 of 14
train/solved
train/solved
Showing first 10 runs
0
200
400
600
800
1k
Step
0
0.2
0.4
0.6
0.8
train/reward
train/reward
Showing first 10 runs
0
200
400
600
800
Step
0
0.2
0.4
0.6
0.8
train/match
train/match
Showing first 10 runs
0
200
400
600
800
Step
0
0.2
0.4
0.6
0.8
1
train/kl
train/kl
Showing first 10 runs
0
200
400
600
800
Step
-0.4
-0.2
0
0.2
0.4
train/num_steps
train/num_steps
Showing first 10 runs
0
200
400
600
800
Step
20
40
60
80
100
train/response_length
train/response_length
Showing first 10 runs
0
200
400
600
800
Step
600
800
1000
1200
1400
1600
1800
2000
train/avg_ratio
train/avg_ratio
Showing first 10 runs
0
200
400
600
800
Step
2
4
6
8
10
train/have_answers
train/have_answers
0
200
400
600
800
Step
0
0.2
0.4
0.6
0.8
1
System
23
1-6 of 23
Panel Section
0
Add section