Skip to main content
agentboard
Projects
llm-agent-eval-gpt-4-all
Workspace
Log in
Sign up
Project
Workspace
Runs
Automat.
Sweeps
Reports
Artifacts
Changma's workspace
Personal workspace
Automated workspace
Changes are only visible to you.
Runs
1
Name
1 visualized
gpt_azure_gpt-4
gpt_azure_gpt-4
1-1
of 1
Settings
Add panels
summary
4
1-4 of 4
scienceworld
6
1-6 of 6
runs
.
summary
["
scienceworld/metrics
"]
⏎
Filter
Progress Rate
0.7551
Success Rate
0.4
Metric Name
Metric Value (%)
1
2
scienceworld/metrics_comparison
Current Run
gpt-35-turbo
text-davinci-003
llama2-70b
0
0.2
0.4
0.6
Progress Rate (%)
Success Rate (%)
Grounding Accuracy (%)
Scienceworld Metrics Compared to Baseline Models
plotly-logomark
scienceworld/task_reward_w.r.t_steps
0
10
20
30
0
20
40
60
Model Name, Is Baseline
Current Run, False
gpt-35-turbo-16k, True
gpt-35-turbo, True
codellama-34b, True
claude2, True
lemur-70b, True
llama2-70b, True
text-davinci-003, True
Average Progress Rate (%) w.r.t Steps for scienceworld Tasks
steps
score
plotly-logomark
scienceworld/progress_score_w.r.t_difficulty
0
0.2
0.4
0.6
0.8
gpt-35-turbo-16k
llama2-70b
codellama-34b
text-davinci-003
claude2
gpt-35-turbo
lemur-70b
Current Run
Progress Rate For Easy Examples(%)
Progress Rate For Hard Examples(%)
Scienceworld Progress Rate w.r.t Difficulty
plotly-logomark
scienceworld/success_rate_w.r.t_difficulty
0
0.2
0.4
0.6
gpt-35-turbo-16k
codellama-34b
llama2-70b
lemur-70b
text-davinci-003
claude2
gpt-35-turbo
Current Run
Success Rate For Easy Examples(%)
Success Rate For Hard Examples(%)
Scienceworld Success Rate w.r.t Difficulty
plotly-logomark
runs
.
summary
["
scienceworld/predictions
"]
⏎
Filter
id
is_done
env.difficulty
env.goal
env.task_name
reward
grounding_accuracy
reward_wrt_step
trajectory
29
Add section
To pick up a draggable item, press the space bar. While dragging, use the arrow keys to move the item. Press space again to drop the item in its new position, or press escape to cancel.
List<File<(table)>>
Ops
.contents
.count
.digest
.dropna
.filter((row) => row)
.isNone
.join(, (row) => row, (row) => row, "", "", , )
.joinToStr("")
.map((row, index) => row)
.merge("")
.size
.table
.table("")
[]
.project
.run
List<File<(table)>>
Ops
.contents
.count
.digest
.dropna
.filter((row) => row)
.isNone
.join(, (row) => row, (row) => row, "", "", , )
.joinToStr("")
.map((row, index) => row)
.merge("")
.size
.table
.table("")
[]
.project
.run