Evaluations
Filter
inputs
output
MMLUOptionAccuracy
correct
Trace
Feedback
Status
model
self
f1_score
false_count
1-8 of 8
Per page:
50
Charts
3
Score summary
2
General
Cost
$1.52
↗+ $1.52
Tokens
555.06K
↗+ 1.57K
Latency
7m48s
↗+ 2m58s
MMLUOptionAccuracy
correct.true_count
120
↗+ 22
correct.false_count
15
↘- 22
correct.true_fraction
0.89
↗+ 0.16
correct.false_fraction
0.11
↘- 0.16
correct.stderr
0.03
↘- 0.01
correct.precision
0.89
↗+ 0.16
correct.recall
1
↗+ 0.01
correct.f1_score
0.94
↗+ 0.1
model_latency
mean
3.33
↗+ 1.34