Skip to main content
eval-course
Projects
open-hands
Traces
Log in
Sign up
Project
Traces
Evals
Playground
Monitors
Leaders
Threads
Assets
Traces
All Ops
Filter
Visualize
Columns
inputs
output
Trace
Feedback
Status
model
self
...
true_count
...
true_fraction
...
mean
...
mean
...
true_count
...
true_fraction
Evaluation.evaluate
6c25
gpt-4:v0
rag-eval-gpt-4:v0
N/A
N/A
N/A
N/A
N/A
N/A
Evaluation.evaluate
3a4c
gpt-4:v0
math-eval-gpt-4:v0
N/A
N/A
N/A
N/A
8
0.8
Evaluation.evaluate
a24c
gpt-4:v0
essay-eval-gpt-4:v0
3
0.3
0.9
1.3
N/A
N/A
Evaluation.evaluate
ccb2
claude-3.5-sonnet:v0
rag-eval-claude-3.5-sonnet:v0
N/A
N/A
N/A
N/A
N/A
N/A
Evaluation.evaluate
089e
claude-3.5-sonnet:v0
math-eval-claude-3.5-sonnet:v0
N/A
N/A
N/A
N/A
7
0.7
Evaluation.evaluate
729f
claude-3.5-sonnet:v0
essay-eval-claude-3.5-sonnet:v1
2
0.2
1
1.4
N/A
N/A
Evaluation.evaluate
0c75
claude-3.5-sonnet:v0
essay-eval-claude-3.5-sonnet:v0
N/A
N/A
N/A
N/A
N/A
N/A