Skip to main content

Changma's workspace

jericho
6
0
False
hard
id
is_done
env.difficulty
env.goal
env.task_name
reward
grounding_accuracy
reward_wrt_step
trajectory
1
jericho/success_rate_w.r.t_difficulty
00.20.4Current Runllama2-70blemur-70bcodellama-13bcodellama-34bgpt-35-turbotext-davinci-003gpt-4
Success Rate For Easy Examples(%)Success Rate For Hard Examples(%)Jericho Success Rate w.r.t Difficulty
jericho/metrics_comparison
gpt-4gpt-35-turbolemur-70bllama2-70b00.51
Progress Rate (%)Success Rate (%)Grounding Accuracy (%)Jericho Metrics Compared to Baseline Models
Metric Name
Metric Value (%)
1
2
jericho/task_reward_w.r.t_steps
010203002040
Model Name, Is BaselineCurrent Run, Falsegpt-35-turbo, Truetext-davinci-003, Truellama2-70b, Truegpt-4, Truelemur-70b, Truecodellama-13b, Truecodellama-34b, TrueAverage Progress Rate (%) w.r.t Steps for jericho Tasksstepsscore
jericho/progress_score_w.r.t_difficulty
00.20.4codellama-13bllama2-70bCurrent Runlemur-70bcodellama-34bgpt-35-turbotext-davinci-003gpt-4
Progress Rate For Easy Examples(%)Progress Rate For Hard Examples(%)Jericho Progress Rate w.r.t Difficulty
pddl
6
alfworld
6
List<File<(table)>>
List<File<(table)>>