Skip to main content

Regression Report: train_policy_accelerate

[['?we=openrlbenchmark&wpn=lm-human-preferences&ceik=task_id&cen=task.value.policy.initial_model&metrics=ppo/objective/score&metrics=ppo/objective/kl&metrics=ppo/objective/entropy&metrics=ppo/objective/kl_coef&metrics=ppo/ppo/loss/total&metrics=ppo/ppo/loss/value&metrics=ppo/ppo/loss/policy&metrics=ppo/ppo/policy/clipfrac&metrics=ppo/ppo/policy/entropy&metrics=ppo/ppo/returns/mean&metrics=ppo/ppo/policy/approxkl&metrics=ppo/ppo/val/clipfrac&metrics=ppo/ppo/val/error&metrics=ppo/ppo/val/mean&metrics=ppo/ppo/returns/var&metrics=ppo/ppo/val/vpred', '124M'], ['?we=costa-huang&wpn=cleanrl&ceik=rewards.value.label_dataset&cen=exp_name&metrics=objective/scores&metrics=objective/kl&metrics=objective/entropy&metrics=objective/kl_coef&metrics=ppo/loss/total&metrics=ppo/loss/value&metrics=ppo/loss/policy&metrics=ppo/policy/clipfrac&metrics=ppo/policy/entropy&metrics=ppo/returns/mean&metrics=ppo/policy/approxkl&metrics=ppo/val/clipfrac&metrics=ppo/val/error&metrics=ppo/val/mean&metrics=ppo/returns/var&metrics=ppo/val/vpred', 'train_policy_accelerate?tag=v0.1.0-20-gd63c6c3']]
Created on July 16|Last edited on July 16

5001k1.5kSteps-0.500.511.52Episodic Return
5001k1.5kSteps123456Episodic Return
5001k1.5kSteps40424446485052Episodic Return
openrlbenchmark/lm-human-preferences/124M ({})
40
costa-huang/cleanrl/train_policy_accelerate ({'tag': ['v0.1.0-20-gd63c6c3']})
10