Evaluations
Filter
inputs
output
Trace
Feedback
Status
model
self
...true_count
...true_fraction
1-13 of 13
Per page:
50
Charts
3
Score summary
5
General
Cost
$0.25
↘- $0.34
Tokens
23.66K
↘- 8.24K
Latency
4.24s
↘- 11.81s
CorrectnessLLMJudge
answer correct.true_count
14
↘- 3
answer correct.true_fraction
0.58
↘- 0.13
answer correct.stderr
0.1
↗+ 0.02
HallucinationLLMJudge
follows from source.true_count
12
↘- 12
follows from source.true_fraction
0.5
↘- 0.5
eval_retrieval
first retrieval correct.true_count
15
↗+ 0
first retrieval correct.true_fraction
0.63
↗+ 0
model_latency
mean
1.73
↘- 3.19
Correctness
score.true_count
3
↗+ 0
score.true_fraction
1
↗+ 0