Skip to main content

Changma's workspace

scienceworld
6
Progress Rate
0.7551
Success Rate
0.4
Metric Name
Metric Value (%)
1
2
scienceworld/metrics_comparison
Current Rungpt-35-turbotext-davinci-003llama2-70b00.20.40.6
Progress Rate (%)Success Rate (%)Grounding Accuracy (%)Scienceworld Metrics Compared to Baseline Models
scienceworld/task_reward_w.r.t_steps
01020300204060
Model Name, Is BaselineCurrent Run, Falsegpt-35-turbo-16k, Truegpt-35-turbo, Truecodellama-34b, Trueclaude2, Truelemur-70b, Truellama2-70b, Truetext-davinci-003, TrueAverage Progress Rate (%) w.r.t Steps for scienceworld Tasksstepsscore
scienceworld/progress_score_w.r.t_difficulty
00.20.40.60.8gpt-35-turbo-16kllama2-70bcodellama-34btext-davinci-003claude2gpt-35-turbolemur-70bCurrent Run
Progress Rate For Easy Examples(%)Progress Rate For Hard Examples(%)Scienceworld Progress Rate w.r.t Difficulty
scienceworld/success_rate_w.r.t_difficulty
00.20.40.6gpt-35-turbo-16kcodellama-34bllama2-70blemur-70btext-davinci-003claude2gpt-35-turboCurrent Run
Success Rate For Easy Examples(%)Success Rate For Hard Examples(%)Scienceworld Success Rate w.r.t Difficulty
id
is_done
env.difficulty
env.goal
env.task_name
reward
grounding_accuracy
reward_wrt_step
trajectory
29
babyai
6
List<File<(table)>>
List<File<(table)>>