Regression Report: train_policy_accelerate

[['?we=openrlbenchmark&wpn=lm-human-preferences&xaxis=_step&ceik=task_id&cen=task.value.policy.initial_model&metrics=ppo/objective/score&metrics=ppo/objective/kl&metrics=ppo/objective/entropy&metrics=ppo/objective/score_total&metrics=ppo/objective/kl_coef&metrics=ppo/ppo/loss/total&metrics=ppo/ppo/loss/value&metrics=ppo/ppo/loss/policy&metrics=ppo/ppo/policy/clipfrac&metrics=ppo/ppo/policy/entropy&metrics=ppo/ppo/returns/mean&metrics=ppo/ppo/policy/approxkl&metrics=ppo/ppo/val/clipfrac&metrics=ppo/ppo/val/error&metrics=ppo/ppo/val/mean&metrics=ppo/ppo/returns/var&metrics=ppo/ppo/val/vpred', '124M'], ['?we=openrlbenchmark&wpn=lm_human_preference_details&xaxis=_step&ceik=rewards.value.label_dataset&cen=exp_name&metrics=objective/scores&metrics=objective/kl&metrics=objective/entropy&metrics=objective/score_total&metrics=objective/kl_coef&metrics=ppo/loss/total&metrics=ppo/loss/value&metrics=ppo/loss/policy_avg&metrics=ppo/policy/clipfrac_avg&metrics=ppo/policy/entropy_avg&metrics=ppo/returns/mean&metrics=ppo/policy/approxkl_avg&metrics=ppo/val/clipfrac_avg&metrics=ppo/val/error&metrics=ppo/val/mean&metrics=ppo/returns/var&metrics=ppo/val/vpred', 'train_policy_accelerate?tag=v0.1.0-58-g4f42012&tag=tf_adam&tag=gpt2-large&cl=tf_adam,gpt2-large', 'train_policy_accelerate?tag=v0.1.0-58-g4f42012&tag=pt_adam&tag=gpt2-large&cl=pt_adam,gpt2-large']]

Costa

Created on August 12|Last edited on August 13

Comment

﻿
﻿
ppo/objective/score sentiment
ppo/objective/score sentiment
5001k1.5k2kSteps012Episodic Return
ppo/objective/kl sentiment
ppo/objective/kl sentiment
5001k1.5k2kSteps24681012Episodic Return
ppo/objective/entropy sentiment
ppo/objective/entropy sentiment
5001k1.5k2kSteps3035404550Episodic Return
openrlbenchmark/lm-human-preferences/124M ({})40
tf_adam,gpt2-large3
pt_adam,gpt2-large10
﻿
﻿
﻿
openrlbenchmark/lm-human-preferences/124M ({})41
tf_adam,gpt2-large5
pt_adam,gpt2-large10
﻿
﻿

Add a comment