Skip to main content
eval-course
Projects
eval-course-dev
Evaluations
Log in
Sign up
Overview
Traces
Evals
Playground
Monitors
Leaders
Threads
Assets
Evaluations
Filter
inputs
output
exact_match
is_correct
Trace
Feedback
Status
model
self
mean
true_count
true_fraction
true_count
true_fraction
Evaluation.evaluate
a211
MisinformationEvaluator:v2
Evaluation:v4
N/A
N/A
N/A
50
1
Evaluation.evaluate
3457
MisinformationEvaluator:v1
Evaluation:v4
N/A
N/A
N/A
N/A
N/A
Evaluation.evaluate
fb4a
MisinformationEvaluator:v0
Evaluation:v4
N/A
N/A
N/A
48
0.96
Evaluation.evaluate
64e7
CorrectnessEvaluator:v2
Evaluation:v3
N/A
N/A
N/A
69
0.69
Evaluation.evaluate
10b5
CorrectnessEvaluator:v1
Evaluation:v3
N/A
N/A
N/A
51
0.51
Evaluation.evaluate
7714
CorrectnessEvaluator:v0
Evaluation:v3
N/A
N/A
N/A
59
0.59
Evaluation.evaluate
32f9
PairWiseEvaluator:v6
Evaluation:v2
N/A
85
0.85
N/A
N/A
Evaluation.evaluate
a6ca
PairWiseEvaluator:v5
Evaluation:v2
N/A
44
0.44
N/A
N/A
Evaluation.evaluate
4d5c
PairWiseEvaluator:v6
Evaluation:v1
N/A
2
1
N/A
N/A
Evaluation.evaluate
05d7
PairWiseEvaluator:v5
Evaluation:v1
N/A
2
1
N/A
N/A
Evaluation.evaluate
2972
PairWiseEvaluator:v4
Evaluation:v0
N/A
N/A
N/A
N/A
N/A
Evaluation.evaluate
52b0
PairWiseEvaluator:v3
Evaluation:v0
N/A
N/A
N/A
N/A
N/A
Evaluation.evaluate
4c5d
EssayEvaluator:v8
essay_scorer_small-evaluation:v3
N/A
4
0.4
N/A
N/A
Evaluation.evaluate
f027
EssayEvaluator:v7
essay_scorer_small-evaluation:v3
N/A
4
0.4
N/A
N/A