Regression Report: 124M

[['?we=openrlbenchmark&wpn=lm-human-preferences&ceik=task_id&cen=task.value.policy.initial_model&metrics=ppo/objective/score&metrics=ppo/objective/kl&metrics=ppo/ppo/loss/policy&metrics=ppo/ppo/val/mean&metrics=ppo/ppo/policy/entropy&metrics=ppo/ppo/policy/approxkl&metrics=ppo/ppo/val/error&metrics=ppo/ppo/loss/total&metrics=ppo/ppo/returns/mean&metrics=train_reward/minibatch/loss&metrics=ppo/ppo/val/vpred&metrics=ppo/ppo/loss/value&metrics=ppo/ppo/val/var_explained&metrics=ppo/objective/score_total&metrics=train_reward/minibatch/error&metrics=ppo/elapsed/fps&metrics=ppo/global_step&metrics=ppo/ppo/policy/clipfrac&metrics=ppo/ppo/val/var&metrics=ppo/ppo/val/clipfrac&metrics=ppo/objective/entropy&metrics=ppo/ppo/returns/var&metrics=ppo/objective/kl_coef&metrics=ppo/elapsed/time', '124M']]

Costa

Created on July 9|Last edited on July 9

Comment

﻿
﻿
ppo/objective/score sentiment
ppo/objective/score sentiment
5001k1.5kSteps1.21.41.61.82Episodic Return
ppo/objective/kl sentiment
ppo/objective/kl sentiment
5001k1.5kSteps5.866.26.46.6Episodic Return
ppo/ppo/loss/policy sentiment
ppo/ppo/loss/policy sentiment
5001k1.5kSteps-0.004-0.003-0.002-0.001Episodic Return
ppo/ppo/val/mean sentiment
ppo/ppo/val/mean sentiment
5001k1.5kSteps1.21.41.61.822.2Episodic Return
ppo/ppo/policy/entropy sentiment
ppo/ppo/policy/entropy sentiment
5001k1.5kSteps1.651.71.751.81.85Episodic Return
ppo/ppo/policy/approxkl sentiment
ppo/ppo/policy/approxkl sentiment
5001k1.5kSteps0.00010.00020.00030.0004Episodic Return
openrlbenchmark/lm-human-preferences/124M ({})40
﻿
﻿
﻿
openrlbenchmark/lm-human-preferences/124M ({})41
﻿
﻿
﻿
openrlbenchmark/lm-human-preferences/124M ({})20
﻿
﻿

Add a comment