Skip to main content

Tulu3 8B RM

[['?we=ai2-llm&wpn=open_instruct_public&ceik=chat_template_name&cen=chat_template_name&metrics=train/rm/accuracy&metrics=train/rm/loss&metrics=train/rm/chosen_rewards&metrics=train/rm/rejected_rewards&metrics=train/rm/reward_margin&metrics=train/rm/lr&metric_names=Accuracy&metric_names=Loss&metric_names=Chosen Rewards&metric_names=Rejected Rewards&metric_names=Reward Margin&metric_names=Learning Rate', 'tulu?tag=no-tag-734-g3e689d0&tag=pr-616&tag=tulu3_8b_rm&cl=Tulu3 8B RM']]
Created on March 21|Last edited on March 21

50k100k150k200k250kSteps0.60.70.8train/rm/accuracy
50k100k150k200k250kSteps0.30.40.50.60.7train/rm/loss
50k100k150k200k250kSteps0246train/rm/chosen_rewards
50k100k150k200k250kSteps-1012345train/rm/rejected_rewards
50k100k150k200k250kSteps0.511.522.53train/rm/reward_margin
50k100k150k200k250kSteps05e-70.0000010.00000150.0000020.0000025train/rm/lr
Tulu3 8B RM
1