Evaluations
Filter
inputs
output
evaluate_agent_routing
evaluate_final_output
correct
score
correct
Trace
Feedback
Status
model
self
true_count
true_fraction
mean
true_count
true_fraction
1
0.3333
0.3333
0
0
3
1
1
0
0
2
0.6667
0.6667
2
0.6667
3
1
1
2
0.6667
3
1
1
3
1
1
0.3333
0.3333
0
0
3
1
1
0
0
2
0.6667
0.6667
3
1
3
1
1
2
0.6667
1-42 of 42
Per page:
50