Regression Report: train_reward_jax
[['?we=openrlbenchmark&wpn=lm-human-preferences&xaxis=_step&ceik=task_id&cen=task.value.policy.initial_model&metrics=train_reward/minibatch/error', '124M'], ['?we=openrlbenchmark&wpn=lm_human_preference_details&xaxis=_step&ceik=label_dataset&cen=exp_name&metrics=train/loss', 'train_reward_accelerate?tag=v0.1.0-76-gfbf1f0c&tag=tf_adam&tag=gpt2&cl=tf_adam,gpt2', 'train_reward_jax?tag=v0.1.0-75-g8cc6065&tag=tf_adam&tag=gpt2&cl=jax,tf_adam,gpt2']]
Created on August 27|Last edited on August 27
Comment
openrlbenchmark/lm-human-preferences/124M ({})
40
tf_adam,gpt2
9
jax,tf_adam,gpt2
20
Add a comment