Evaluations
Filter
inputs
output
deep_research_scores
model_latency
Trace
Feedback
Status
model
self
comprehensiveness
insight
instruction_following
overall
readability
mean
0.4273
0.4146
0.4928
0.4462
0.4828
58.4402
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
0.4434
0.4306
0.4925
0.4573
0.4908
221.9808
0.417
0.3841
0.4629
0.4318
0.459
1167.3801
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
0.2645
0.242
0.3217
0.2781
0.3183
205.6566
0.3675
0.357
0.4113
0.3786
0.401
263.4442
0.392
0.3767
0.4527
0.4084
0.4444
150.1829
0.3106
0.2854
0.403
0.3367
0.3976
211.1049
0.344
0.3418
0.3985
0.365
0.4035
177.3765
0.3961
0.378
0.4665
0.4152
0.4585
85.2468
0.4079
0.3919
0.5067
0.438
0.4704
197.773
1-37