gen_eval_dataset-evaluation:v4
1
Model
2
answer correct.true_count
3
answer correct.true_fraction
4
answer correct.stderr
5
follows from source.true_count
6
follows from source.true_fraction
7
first retrieval correct.true_count
8
first retrieval correct.true_fraction
9
Avg. Latency
10
Run Date
11
Trials
12
Total Rows: 3