Skip to main content
agentboard
Projects
llm-agent-eval-gpt-4-all
Log in
Sign up
Overview
Workspace
Runs
Automat.
Sweeps
Reports
Artifacts
Changma's workspace
Personal workspace
Automated workspace
Changes are only visible to you.
Runs
1
Name
1 visualized
gpt_azure_gpt-4
gpt_azure_gpt-4
1-1
of 1
scienceworld/success_rate_w.r.t_difficulty
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
gpt-35-turbo-16k
codellama-34b
llama2-70b
lemur-70b
text-davinci-003
claude2
gpt-35-turbo
Current Run
Success Rate For Easy Examples(%)
Success Rate For Hard Examples(%)
Scienceworld Success Rate w.r.t Difficulty
plotly-logomark
Previous
Next