Skip to main content
agentboard
Projects
llm-agent-eval-gpt-4-all
Log in
Sign up
Overview
Workspace
Runs
Automat.
Sweeps
Reports
Artifacts
Changma's workspace
Personal workspace
Automated workspace
Changes are only visible to you.
Runs
1
Name
1 visualized
gpt_azure_gpt-4
gpt_azure_gpt-4
1-1
of 1
scienceworld/metrics_comparison
Current Run
lemur-70b
gpt-35-turbo
claude2
text-davinci-003
gpt-35-turbo-16k
llama2-70b
codellama-34b
0
0.2
0.4
0.6
Progress Rate (%)
Success Rate (%)
Grounding Accuracy (%)
Scienceworld Metrics Compared to Baseline Models
plotly-logomark
Previous
Next