Skip to main content
agentboard
Projects
llm-agent-eval-gpt-4-all
Log in
Sign up
Overview
Workspace
Runs
Automat.
Sweeps
Reports
Artifacts
Changma's workspace
Personal workspace
Automated workspace
Changes are only visible to you.
Runs
1
Name
1 visualized
gpt_azure_gpt-4
gpt_azure_gpt-4
1-1
of 1
scienceworld/task_reward_w.r.t_steps
0
5
10
15
20
25
30
0
20
40
60
Model Name, Is Baseline
Current Run, False
gpt-35-turbo-16k, True
gpt-35-turbo, True
codellama-34b, True
claude2, True
lemur-70b, True
llama2-70b, True
text-davinci-003, True
Average Progress Rate (%) w.r.t Steps for scienceworld Tasks
steps
score
plotly-logomark
Previous
Next