Skip to main content
agentboard
Projects
llm-agent-eval-gpt-35-turbo-all
Workspace
Log in
Sign up
Project
Workspace
Runs
Automat.
Sweeps
Reports
Artifacts
Changma's workspace
Personal workspace
Automated workspace
Changes are only visible to you.
Runs
1
Name
1 visualized
gpt_azure_gpt-35-turbo
gpt_azure_gpt-35-turbo
1-1
of 1
Previous
Next
summary/avg_metrics_comparison
Gpt-4
Text-davinci-003
Current Run
Codellama-34b
Lemur-70b
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Progress Rate (%)
Success Rate (%)
Average Metrics for All Tasks Compared to Baseline Models
plotly-logomark