Skip to main content

Regression Report: 124M

[['?we=openrlbenchmark&wpn=lm-human-preferences&ceik=task_id&cen=task.value.policy.initial_model&metrics=ppo/objective/score&metrics=ppo/objective/kl', '124M']]
Created on June 23|Last edited on June 23

5001k1.5kSteps1.21.41.61.82Episodic Return
5001k1.5kSteps5.866.26.46.6Episodic Return
50100150200250Time (minutes)-0.500.511.52Episodic Return
50100150200250Time (minutes)123456Episodic Return
openrlbenchmark/lm-human-preferences/124M ({})
40



openrlbenchmark/lm-human-preferences/124M ({})
41