Skip to main content
cjreinforce
Projects
openrlhf_train_ppo
Workspace
Log in
Sign up
Overview
Workspace
Runs
Automat.
Sweeps
Reports
Artifacts
Chrisjina's workspace
Personal workspace
Automated workspace
Changes are only visible to you.
Runs
59
Name
19 visualized
ppo_0225T01:08
ppo_0225T01:08
VR_deepseek_1.5B_deepscaler-ds_VR_coef_5.0_deepseek_PRM_k3-kl
VR_deepseek_1.5B_deepscaler-ds_VR_coef_5.0_deepseek_PRM_k3-kl
PRMVR_deepseek_1.5B_deepscaler-ds_VR_coef_5.0_deepseek_PRM_k3-kl
PRMVR_deepseek_1.5B_deepscaler-ds_VR_coef_5.0_deepseek_PRM_k3-kl
VR_roll_bs_64_lr_5e-7_kl_coef_1e-3_t_0.5_VR_coef_1.0_fix_template_bug
VR_roll_bs_64_lr_5e-7_kl_coef_1e-3_t_0.5_VR_coef_1.0_fix_template_bug
PRM_roll_bs_64_lr_5e-7_kl_coef_1e-3_t_0.5_fix_template_bug_hacking_KL
PRM_roll_bs_64_lr_5e-7_kl_coef_1e-3_t_0.5_fix_template_bug_hacking_KL
PRMVR_roll_bs_64_lr_5e-7_kl_coef_1e-3_t_0.5_VR_coef_1.0_800_ds_fix_template_bug
PRMVR_roll_bs_64_lr_5e-7_kl_coef_1e-3_t_0.5_VR_coef_1.0_800_ds_fix_template_bug
PRMVR_roll_bs_64_kl_coef_1e-3_t_0.5_new_80k_ds_fix_template_bug
PRMVR_roll_bs_64_kl_coef_1e-3_t_0.5_new_80k_ds_fix_template_bug
PRMVR_roll_bs_64_lr_5e-7_kl_coef_1e-3_t_0.5_VR_coef_0.1_fix_template_bug
PRMVR_roll_bs_64_lr_5e-7_kl_coef_1e-3_t_0.5_VR_coef_0.1_fix_template_bug
PRMVR_roll_bs_64_lr_3e-7_kl_coef_1e-3_t_0.5_VR_coef_1
PRMVR_roll_bs_64_lr_3e-7_kl_coef_1e-3_t_0.5_VR_coef_1
PRMVR_roll_bs_64_kl_coef_1e-3_t_0.5_new_80k_ds
PRMVR_roll_bs_64_kl_coef_1e-3_t_0.5_new_80k_ds
PRMVR_roll_bs_64_kl_coef_1e-3_t_0.5_VR_coef_1
PRMVR_roll_bs_64_kl_coef_1e-3_t_0.5_VR_coef_1
VR_roll_bs_32_kl_coef_1e-3_t_0.5
VR_roll_bs_32_kl_coef_1e-3_t_0.5
PRMVR_roll_bs_32_kl_coef_1e-2_t_0.6
PRMVR_roll_bs_32_kl_coef_1e-2_t_0.6
PRMVR_roll_bs_32_kl_coef_1e-3_t_0.6_baseline_step_hacking
PRMVR_roll_bs_32_kl_coef_1e-3_t_0.6_baseline_step_hacking
PRM_roll_bs_32_kl_coef_1e-3_t_0.6_hacking
PRM_roll_bs_32_kl_coef_1e-3_t_0.6_hacking
VR_roll_bs_32_kl_coef_1e-3_t_0.6
VR_roll_bs_32_kl_coef_1e-3_t_0.6
PRMVR_vllm_roll_bs_128_kl_coef_1e-3_t_0.6
PRMVR_vllm_roll_bs_128_kl_coef_1e-3_t_0.6
PRM+VR_vllm_rollout_bs_256_kl_coef_1e-3_t_0.6_hacking_non-specified_separator
PRM+VR_vllm_rollout_bs_256_kl_coef_1e-3_t_0.6_hacking_non-specified_separator
PRM+VR_vllm_roll_bs_32_kl_coef_1e-3_bug_in_lr_sched
PRM+VR_vllm_roll_bs_32_kl_coef_1e-3_bug_in_lr_sched
PRM+VR_vllm_rollout_bs_256_kl_coef_1e-3_hacking_non-specified_separator
PRM+VR_vllm_rollout_bs_256_kl_coef_1e-3_hacking_non-specified_separator
PRM+VR_rollout_bs_32_kl_coef_1e-4
PRM+VR_rollout_bs_32_kl_coef_1e-4
PRM+VR_rollout_bs_32_kl_coef_1e-3
PRM+VR_rollout_bs_32_kl_coef_1e-3
PRM_kl_coef_1e-3_rollout_bs_32_hacking
PRM_kl_coef_1e-3_rollout_bs_32_hacking
PRM_kl_coef_1e-2_rollout_bs_32
PRM_kl_coef_1e-2_rollout_bs_32
1-24
of 24
train/match
train/match
Showing first 10 runs
0
200
400
600
800
Step
0
0.2
0.4
0.6
0.8
1
VR_roll_bs_64_lr_5e-7_kl_coef_1e-3_t_0.5_VR_coef_1.0_fix_template_bug
PRMVR_roll_bs_64_lr_5e-7_kl_coef_1e-3_t_0.5_VR_coef_1.0_800_ds_fix_template_bug
PRMVR_roll_bs_64_kl_coef_1e-3_t_0.5_new_80k_ds_fix_template_bug
PRMVR_roll_bs_64_lr_5e-7_kl_coef_1e-3_t_0.5_VR_coef_0.1_fix_template_bug
PRMVR_roll_bs_64_lr_3e-7_kl_coef_1e-3_t_0.5_VR_coef_1
PRMVR_roll_bs_64_kl_coef_1e-3_t_0.5_new_80k_ds
VR_roll_bs_32_kl_coef_1e-3_t_0.5
PRMVR_roll_bs_32_kl_coef_1e-2_t_0.6
PRMVR_roll_bs_32_kl_coef_1e-3_t_0.6_baseline_step_hacking
PRM_roll_bs_32_kl_coef_1e-3_t_0.6_hacking
Previous
Next