Skip to main content

Regression Report: train_reward_accelerate

[['?we=openrlbenchmark&wpn=lm-human-preferences&xaxis=_step&ceik=task_id&cen=task.value.policy.initial_model&metrics=train_reward/minibatch/error', '124M'], ['?we=openrlbenchmark&wpn=lm_human_preference_details&xaxis=_step&ceik=rewards.value.label_dataset&cen=exp_name&metrics=objective/scores&metrics=objective/kl&metrics=objective/entropy&metrics=objective/score_total&metrics=objective/kl_coef&metrics=ppo/loss/total&metrics=ppo/loss/value&metrics=ppo/loss/policy_avg&metrics=ppo/policy/clipfrac_avg&metrics=ppo/policy/entropy_avg&metrics=ppo/returns/mean&metrics=ppo/policy/approxkl_avg&metrics=ppo/val/clipfrac_avg&metrics=ppo/val/error&metrics=ppo/val/mean&metrics=ppo/returns/var&metrics=ppo/val/vpred', 'train_policy_accelerate?tag=v0.1.0-68-g2f3aa38&tag=tf_adam&tag=gpt2&cl=tf_adam,gpt2'], ['?we=tliu&wpn=cleanrl&xaxis=_step&ceik=label_dataset&cen=exp_name&metrics=train/loss', 'train_reward_jax', 'train_reward_accelerate']]
Created on August 27|Last edited on August 27

05001k1.5kSteps-0.500.511.5Episodic Return
openrlbenchmark/lm-human-preferences/124M ({})
40
tf_adam,gpt2
9
tliu/cleanrl/train_reward_jax ({})
tliu/cleanrl/train_reward_accelerate ({})