Skip to main content
eval-course
Projects
open-hands
Evaluations
Log in
Sign up
Overview
Traces
Evals
Playground
Monitors
Leaders
Threads
Assets
Evaluations
Filter
inputs
output
essay_score
math_score
correct
mae
squared_error
correct
Trace
Feedback
Status
model
self
true_count
true_fraction
mean
mean
true_count
Evaluation.evaluate
6c25
gpt-4:v0
rag-eval-gpt-4:v0
N/A
N/A
N/A
N/A
N/A
Evaluation.evaluate
3a4c
gpt-4:v0
math-eval-gpt-4:v0
N/A
N/A
N/A
N/A
8
Evaluation.evaluate
a24c
gpt-4:v0
essay-eval-gpt-4:v0
3
0.3
0.9
1.3
N/A
Evaluation.evaluate
ccb2
claude-3.5-sonnet:v0
rag-eval-claude-3.5-sonnet:v0
N/A
N/A
N/A
N/A
N/A
Evaluation.evaluate
089e
claude-3.5-sonnet:v0
math-eval-claude-3.5-sonnet:v0
N/A
N/A
N/A
N/A
7
Evaluation.evaluate
729f
claude-3.5-sonnet:v0
essay-eval-claude-3.5-sonnet:v1
2
0.2
1
1.4
N/A
Evaluation.evaluate
0c75
claude-3.5-sonnet:v0
essay-eval-claude-3.5-sonnet:v0
N/A
N/A
N/A
N/A
N/A