Skip to main content
cjreinforce
Projects
openrlhf_train_ppo
Workspace
Log in
Sign up
Overview
Workspace
Runs
Automat.
Sweeps
Reports
Artifacts
Chrisjina's workspace
Personal workspace
Automated workspace
Changes are only visible to you.
Runs
59
Name
19 visualized
ppo_0225T01:08
ppo_0225T01:08
VR_deepseek_1.5B_deepscaler-ds_VR_coef_5.0_deepseek_PRM_k3-kl
VR_deepseek_1.5B_deepscaler-ds_VR_coef_5.0_deepseek_PRM_k3-kl
PRMVR_deepseek_1.5B_deepscaler-ds_VR_coef_5.0_deepseek_PRM_k3-kl
PRMVR_deepseek_1.5B_deepscaler-ds_VR_coef_5.0_deepseek_PRM_k3-kl
VR_roll_bs_64_lr_5e-7_kl_coef_1e-3_t_0.5_VR_coef_1.0_fix_template_bug
VR_roll_bs_64_lr_5e-7_kl_coef_1e-3_t_0.5_VR_coef_1.0_fix_template_bug
PRM_roll_bs_64_lr_5e-7_kl_coef_1e-3_t_0.5_fix_template_bug_hacking_KL
PRM_roll_bs_64_lr_5e-7_kl_coef_1e-3_t_0.5_fix_template_bug_hacking_KL
PRMVR_roll_bs_64_lr_5e-7_kl_coef_1e-3_t_0.5_VR_coef_1.0_800_ds_fix_template_bug
PRMVR_roll_bs_64_lr_5e-7_kl_coef_1e-3_t_0.5_VR_coef_1.0_800_ds_fix_template_bug
PRMVR_roll_bs_64_kl_coef_1e-3_t_0.5_new_80k_ds_fix_template_bug
PRMVR_roll_bs_64_kl_coef_1e-3_t_0.5_new_80k_ds_fix_template_bug
PRMVR_roll_bs_64_lr_5e-7_kl_coef_1e-3_t_0.5_VR_coef_0.1_fix_template_bug
PRMVR_roll_bs_64_lr_5e-7_kl_coef_1e-3_t_0.5_VR_coef_0.1_fix_template_bug
PRMVR_roll_bs_64_lr_3e-7_kl_coef_1e-3_t_0.5_VR_coef_1
PRMVR_roll_bs_64_lr_3e-7_kl_coef_1e-3_t_0.5_VR_coef_1
PRMVR_roll_bs_64_kl_coef_1e-3_t_0.5_new_80k_ds
PRMVR_roll_bs_64_kl_coef_1e-3_t_0.5_new_80k_ds
PRMVR_roll_bs_64_kl_coef_1e-3_t_0.5_VR_coef_1
PRMVR_roll_bs_64_kl_coef_1e-3_t_0.5_VR_coef_1
VR_roll_bs_32_kl_coef_1e-3_t_0.5
VR_roll_bs_32_kl_coef_1e-3_t_0.5
PRMVR_roll_bs_32_kl_coef_1e-2_t_0.6
PRMVR_roll_bs_32_kl_coef_1e-2_t_0.6
PRMVR_roll_bs_32_kl_coef_1e-3_t_0.6_baseline_step_hacking
PRMVR_roll_bs_32_kl_coef_1e-3_t_0.6_baseline_step_hacking
PRM_roll_bs_32_kl_coef_1e-3_t_0.6_hacking
PRM_roll_bs_32_kl_coef_1e-3_t_0.6_hacking
VR_roll_bs_32_kl_coef_1e-3_t_0.6
VR_roll_bs_32_kl_coef_1e-3_t_0.6
PRMVR_vllm_roll_bs_128_kl_coef_1e-3_t_0.6
PRMVR_vllm_roll_bs_128_kl_coef_1e-3_t_0.6
PRM+VR_vllm_rollout_bs_256_kl_coef_1e-3_t_0.6_hacking_non-specified_separator
PRM+VR_vllm_rollout_bs_256_kl_coef_1e-3_t_0.6_hacking_non-specified_separator
PRM+VR_vllm_roll_bs_32_kl_coef_1e-3_bug_in_lr_sched
PRM+VR_vllm_roll_bs_32_kl_coef_1e-3_bug_in_lr_sched
PRM+VR_vllm_rollout_bs_256_kl_coef_1e-3_hacking_non-specified_separator
PRM+VR_vllm_rollout_bs_256_kl_coef_1e-3_hacking_non-specified_separator
PRM+VR_rollout_bs_32_kl_coef_1e-4
PRM+VR_rollout_bs_32_kl_coef_1e-4
PRM+VR_rollout_bs_32_kl_coef_1e-3
PRM+VR_rollout_bs_32_kl_coef_1e-3
PRM_kl_coef_1e-3_rollout_bs_32_hacking
PRM_kl_coef_1e-3_rollout_bs_32_hacking
PRM_kl_coef_1e-2_rollout_bs_32
PRM_kl_coef_1e-2_rollout_bs_32
1-24
of 24
Add panels
train
14
1-14 of 14
train/solved
train/solved
Showing first 10 runs
0
200
400
600
800
1k
Step
0
0.2
0.4
0.6
0.8
train/reward
train/reward
Showing first 10 runs
0
200
400
600
800
Step
0
0.2
0.4
0.6
0.8
train/match
train/match
Showing first 10 runs
0
200
400
600
800
Step
0
0.2
0.4
0.6
0.8
1
train/kl
train/kl
Showing first 10 runs
0
200
400
600
800
Step
-0.4
-0.2
0
0.2
0.4
train/num_steps
train/num_steps
Showing first 10 runs
0
200
400
600
800
Step
20
40
60
80
100
train/response_length
train/response_length
Showing first 10 runs
0
200
400
600
800
Step
600
800
1000
1200
1400
1600
1800
2000
train/avg_ratio
train/avg_ratio
Showing first 10 runs
0
200
400
600
800
Step
2
4
6
8
10
train/have_answers
train/have_answers
0
200
400
600
800
Step
0
0.2
0.4
0.6
0.8
1
System
23
1-6 of 23
Panel Section
0
Add section