Evaluations
Filter
inputs
output
get_answer_correctness
model_latency
model_output
answer_correctness
completion_tokens
prompt_tokens
Trace
Feedback
Status
model
self
true_count
true_fraction
mean
mean
mean
2
1
93.7584
723.5
8490.5
2
1
98.4047
718.5
7649.5
N/A
N/A
N/A
N/A
N/A
186
0.6327
278.5907
719.9082
7637.3435
N/A
N/A
N/A
N/A
N/A
202
0.6871
215.8164
739.2619
8109.2245
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
233
0.7925
215.5375
811.8299
8124.5816
249
0.8469
203.746
889.3946
7469.4116
250
0.8503
230.4384
860.0646
7559.2959
N/A
N/A
N/A
N/A
N/A
241
0.8197
197.8113
814.8605
8137.898
1-42 of 42
Per page:
50