Skip to main content

Regression Report: wandb

[['?we=costa-huang&wpn=trl&xaxis=_step&ceik=trl_ppo_trainer_config.value.tracker_project_name&cen=trl_ppo_trainer_config.value.log_with&metrics=env/reward_mean&metrics=objective/kl', 'wandb?tag=calculator_mask&cl=calculator_mask', 'wandb?tag=calculator_mask_direct_rewrad&cl=calculator_mask_direct_rewrad']]
Created on June 26|Last edited on June 26

100200300400500600Steps0.20.40.60.8Episodic Return
100200300400500600Steps246810Episodic Return
50100150Time (minutes)0.20.40.60.8Episodic Return
50100150Time (minutes)246810Episodic Return
calculator_mask
10
calculator_mask_direct_rewrad
10