Evaluations
Filter
inputs
output
Correctness
DecisionScorer
score
Trace
Feedback
Status
model
self
true_count
true_fraction
accuracy
decision_match_rate
f1
N/A
N/A
1
1
1
N/A
N/A
0.9167
0.9167
0.9485
N/A
N/A
0.9333
0.9333
0.9592
N/A
N/A
0.75
0.75
0.8485
N/A
N/A
0.55
0.55
0.7097
N/A
N/A
0.9
0.9
0.9375
1-11 of 11
Per page:
50