Skip to main content
agentboard
Projects
llm-agent-eval-llama2-13b-all
Workspace
Log in
Sign up
Overview
Workspace
Runs
Automat.
Sweeps
Reports
Artifacts
Changma's workspace
Personal workspace
Automated workspace
Changes are only visible to you.
Runs
3
Name
1 visualized
vllm_meta-llama/Llama-2-13b-chat-hf
vllm_meta-llama/Llama-2-13b-chat-hf
vllm_meta-llama/Llama-2-13b-chat-hf
vllm_meta-llama/Llama-2-13b-chat-hf
vllm_meta-llama/Llama-2-13b-chat-hf
vllm_meta-llama/Llama-2-13b-chat-hf
1-3
of 3
Settings
Add panels
webshop
6
1-6 of 6
jericho
6
1-6 of 6
runs
.
summary
["
jericho/predictions
"]
⏎
Filter
0
False
hard
id
is_done
env.difficulty
env.goal
env.task_name
reward
grounding_accuracy
reward_wrt_step
trajectory
1
jericho/success_rate_w.r.t_difficulty
0
0.2
0.4
Current Run
llama2-70b
lemur-70b
codellama-13b
codellama-34b
gpt-35-turbo
text-davinci-003
gpt-4
Success Rate For Easy Examples(%)
Success Rate For Hard Examples(%)
Jericho Success Rate w.r.t Difficulty
plotly-logomark
jericho/metrics_comparison
gpt-4
gpt-35-turbo
lemur-70b
llama2-70b
0
0.5
1
Progress Rate (%)
Success Rate (%)
Grounding Accuracy (%)
Jericho Metrics Compared to Baseline Models
plotly-logomark
runs
.
summary
["
jericho/metrics
"]
⏎
Filter
Metric Name
Metric Value (%)
1
2
jericho/task_reward_w.r.t_steps
0
10
20
30
0
20
40
Model Name, Is Baseline
Current Run, False
gpt-35-turbo, True
text-davinci-003, True
llama2-70b, True
gpt-4, True
lemur-70b, True
codellama-13b, True
codellama-34b, True
Average Progress Rate (%) w.r.t Steps for jericho Tasks
steps
score
plotly-logomark
jericho/progress_score_w.r.t_difficulty
0
0.2
0.4
codellama-13b
llama2-70b
Current Run
lemur-70b
codellama-34b
gpt-35-turbo
text-davinci-003
gpt-4
Progress Rate For Easy Examples(%)
Progress Rate For Hard Examples(%)
Jericho Progress Rate w.r.t Difficulty
plotly-logomark
pddl
6
1-6 of 6
alfworld
6
1-6 of 6
Add section
List<File<(table)>>
Ops
.contents
.count
.digest
.dropna
.filter((row) => row)
.isNone
.join(, (row) => row, (row) => row, "", "", , )
.joinToStr("")
.map((row, index) => row)
.merge("")
.size
.table
.table("")
[]
.project
.run
List<File<(table)>>
Ops
.contents
.count
.digest
.dropna
.filter((row) => row)
.isNone
.join(, (row) => row, (row) => row, "", "", , )
.joinToStr("")
.map((row, index) => row)
.merge("")
.size
.table
.table("")
[]
.project
.run