Evaluations
Filter
inputs
output
MMLUOptionAccuracy
correct
Trace
Feedback
Status
model
self
f1_score
false_count
false_fraction
precision
recall
0.8261
40
0.2963
0.7037
1
0.8412
37
0.2741
0.7259
1
0.8362
38
0.2815
0.7185
1
0.8412
37
0.2687
0.7313
0.9899
1-8 of 8
Per page:
50