Skip to main content

Changma's workspace

scienceworld/task_reward_w.r.t_steps
0510152025300204060
Model Name, Is BaselineCurrent Run, Falsegpt-35-turbo-16k, Truegpt-35-turbo, Truecodellama-34b, Trueclaude2, Truelemur-70b, Truellama2-70b, Truetext-davinci-003, TrueAverage Progress Rate (%) w.r.t Steps for scienceworld Tasksstepsscore